1. Field of Invention
This invention relates to the planning and delivery of radiation for the clinical treatment of disease.
2. Description of Prior Art
Radiation treatment is an established technique that is one of the major treatment modalities for cancer, along with surgery and chemotherapy. Conventional techniques for external beam radiation therapy have required a human planner to first define the parameters of the radiation beams, such as position, direction, and shape. Then the radiation delivered by the resulting beam configuration is simulated, and the resulting radiation dose is evaluated for clinical acceptability. If not acceptable, the beam parameters are modified and the process iterates. This process is referred to as forward planning.
Intensity Modulate Radiation Treatment (IMRT) is a more advanced technique that has been developed as a means of achieving better coverage over the tumor (target) and better sparing of nearby critical organs (organs-at-risk). The beams that are delivered in IMRT treatments are defined in terms of a 2-dimensional modulated intensity map across the beam profile. Each pixel in the intensity map is associated with a distinct beamlet, which is the part of the beam's fluence that is passing through that pixel. Each beamlet has an intensity value proportional to the amount of fluence passing through that pixel. FIG. 7 graphically depicts a single beam with sixty-four beamlets having various modulated intensity values. The intensity values are represented in the figure by grayscale shading within the intensity map pixels.
A human planner can in some simple cases directly define the intensity map, in a process similar to forward planning. However, it is generally recognized that an automated algorithm for determining an optimal intensity map, given a set of clinical constraints (such as radiation dose prescription information), is the most robust approach to IMRT planning. This is referred to as inverse planning.
The workflow for inverse planning of an IMRT plan is given in FIG. 1. This follows the process from acquiring planning CT scans 101 and defining target volumes and organs-at-risk using the planning scans 102. Then the parameters for all treatment beams are determined by the user 103. The dose that would be delivered by all beamlets comprising the beams is simulated 104 using radiation dose computation algorithms that are well known in the art.
Then the prescription and other parameters for the optimization objective function are specified by the user 105, and these are used in the actual optimization algorithm 106 to generate the optimal plan. The resulting plan is evaluated by the user 107. If the resulting plan is not acceptable, the optimization parameters are modified and the process repeats 108. Once the resulting plan is acceptable, the plan parameters are sent to the radiation delivery system 109 for delivery to the patient.
As can be seen, the human planner must still generally iterate with the inverse planning algorithm, as there is no guarantee that the algorithm will generate a clinically acceptable plan. However, the iterations are hoped to be both less complex and fewer in number than that which would be expected for a forward planning scenario. It is highly desirable for the inverse planning algorithm to be as fast and accurate as possible, as this allows the human planner to quickly modify parameters and then evaluate the resulting optimized plan.
Inverse Planning and Optimization
An IMRT inverse planning algorithm essentially works by determining the optimal set of beam parameters satisfying a figure of merit for the plan. If the figure of merit can be expressed as a computable function possessing certain properties, the inverse planning problem can be formulated as a general optimization problem with the figure-of-merit function serving as the optimization's objective function.
In the case of IMRT plans, each beam has a unique intensity map that varies across the two dimensional plane parallel to the beam's central axis. The intensity map is typically represented as a set of sample values on a two-dimensional grid. Each pixel on this intensity map grid then corresponds to the intensity of an individual beamlet. The total beam is comprised of all of the weighted beamlets. FIG. 7 graphically depicts the relationship between beamlets and a treatment beam. All beamlets within the beam originate from a common source point 701. Each beamlet has an intensity value, with some being delivered with a high intensity 702, some with a moderate intensity 703, and some with a low intensity 704.
The dimensionality of such an objective function can be very large. Depending on the resolution of the beamlet representation, it can be hundreds or thousands of dimensions for a clinically realistic plan.
Optimization and the Objective Function
The generalized optimization problem involves determining the input parameters for a specified scalar-valued objective function such that the function assumes a globally minimum (maximum) value. A number of robust algorithms have been developed that can be applied to the general optimization problem [Press et. al. (1993)]. Each of these algorithms has strengths and weaknesses, and the effectiveness of a particular algorithm depends, at least partially, on the properties of the objective function.
The objective function acts as a map for the optimization algorithm, guiding it either to a global optimum or, if acceptable, a local optimum which is suitably close to the global. The number of iterations needed to find the optimum, and the quality of the optimum if sub-global, again often depends on properties of the objective function.
One property of the objective function that can play a significant role in the performance of the optimization algorithm is the ability to efficiently evaluate the mathematical gradient of the objective function. The gradient of a function at a point is the direction of steepest ascent at that point. Knowing the gradient for a high-dimensional objective function can significantly reduce the time needed to search for an optimum, because it specifies the direction along which an optimum may be found. Without gradient information, the optimization may spend an inordinate amount of time searching in “useless” directions.
A second important property of the objective function is referred to as directional conjugacy. This is also a significant issue for problems with large input parameter dimensionality. Directional conjugacy becomes an issue when the objective function has a large number of long, narrow valleys that are not orthogonal to the parameter coordinate axes, or to each other. As the optimization proceeds, each of the iterations is typically carried out along a particular direction. If the objective function is not directionally conjugate, the optimization along a particular direction may “spoil” previous optimizations along other directions. Thus the optimization algorithm has to re-optimize along the previous directions, resulting in wasted iterations.
Clearly an advantageous objective function for inverse planning would possess both of these properties: an efficiently computable gradient, and good directional conjugacy. Of course, the objective function also would need to do a good job as a figure of merit for treatment plans—i.e. clinically more desirable treatment plans would need to be scored better by the objective function than less desirable ones.
Linear Programming Approaches
The linear programming approaches are based on the observation that the sum of weighted beamlets to form total dose can be formulated as a (possibly large) matrix problem. In this case, the matrix relates beamlets to particular voxels, so the matrix is a linear transformation from beamlet weight state vector to dose matrix state vector.
Then, if a target dose matrix can be formed, the corresponding beamlet weights can be computed by multiplying by the inverse of the matrix. In the general case where the matrix is not square, this becomes a matrix pseudoinverse problem. Such a problem is a type of least squares optimization with a purely quadratic objective function. A quadratic objective function is ideal with respect to the aforementioned criteria: efficiently computable gradient and directional conjugacy.
However, this approach immediately leads to problems with respect to the clinical acceptability (and physical plausibility) of the solution, because the resulting beamlet weight vector is likely to contain physically unrealizable negative weights.
U.S. Pat. No. 6,882,702 to Luo (2005) proposes a solution for this issue. The disclosed invention uses iterations that gradually remove negative weights from the problem. This is based on a modified objective function that includes a term to penalize negative beamlet weights. The iterations continue until the contribution to the total radiation dose due to negative beamlet weights is minimize. This alteration in the objective function still allows efficient gradient calculation, but compromises directional conjugacy.
Furthermore, one of the requirements of this approach is that an initial “ideal” radiation dose distribution in 3-dimensions be synthesized as an input to the objective function. In many clinical situations, specifying the relevant statistical characteristics of the desired dose distribution, such as min/max dose to volumes-of-interest (VOIs), is preferable to physically spelling out the intended target distribution in 3-dimensions.
Quadratic Objective Functions
One of earliest methods of specifying the statistical characteristics of the desired dose distribution is using quadratic dose-based objective functions [Bortfeld et. al. (1997)]. The gradient for these objective functions are easy to compute, and they possess good directional conjugacy as the quadratic terms are often nearly orthogonal to each other. The resulting optimizations can be implemented readily using a standard gradient-based algorithm, such as the steepest descent or conjugate gradient algorithms.
However, the clinical acceptability of the simple quadratic objective functions is an issue. For the target regions, the quadratic objective function can express the desirability of a minimum target dose, with high dose homogeneity. For organs-at-risk, however, it is more difficult to effectively capture the clinically desirable dose distribution. This is because organs-at-risk can often be partially irradiated without significant side effects. Thus it is often advantageous to allow partial volume irradiation of organs-at-risk in order to increase target coverage. Specifying that partial volume irradiation is allowable for organs-at-risk is difficult using quadratic dose-based objective functions.
U.S. Pat. No. 6,560,311 to Shepard et. al. (2003) uses a quadratic dose-based objective function as the first preferred embodiments. In the described optimization process, at each iteration the parameter values (e.g. beamlet weights) are changed and the objective function is re-evaluated. The new parameter values are kept only if the new objective function value is better than the previous parameter values. This optimization approach is intrinsically less efficient than utilizing gradient information to first choose the direction of optimization.
U.S. Pat. No. 6,735,277 to McNutt et. al. (2004) also specifies a preferred embodiment with a quadratic objective function. The preferred optimization is a quasi-Newton gradient-based optimization. The disclosed invention is a method of dividing the beamlet weights into independent groups, and optimization each group in succession. This is equivalent to partitioning the original high-dimensional objective function in to a set of independent lower-dimensional objective functions. Then, after finding the optimum of all of the lower-dimensional objective functions, these subordinate solutions are recombined into a single solution. This solution is hoped to approximate an optimum for the original problem. The sufficiency of this approximation depends upon the extent of interdependencies among the beamlet weight parameters, which may be extensive for complex target geometries.
Partial Volume Constraints
In order to represent the partial volume effects for organs-at-risk, the quadratic objective functions have been extended by various authors to include partial volume constraints [Starckschall et. al. (2001)]. Whereas a typical quadratic term might express the constraint “deliver at least 80 Gy to the target region”, a partial volume constraint might express “deliver no more than 40 Gy to no more than 25% of the organ-at-risk”. By combining multiple terms of this form, it becomes possible to describe the expected shape of the distribution of radiation dose to the target or organ-at-risk. This distribution is typically depicted for clinical purposes as the dose-volume histogram (DVH). In its differential, relative form, the dose-volume histogram is the curve that shows the percent volume of the region of interest that receives a certain amount of radiation dose.
Starckschall et. al. (2001) combined the concept of partial volume constraints with a gradient-based optimization algorithm. In order to evaluate the gradient, however, they numerically evaluate the partial derivatives of the objective function. This is a computationally costly approach that involves evaluating the objective function for a given set of parameters, and then changing each parameter one-by-one by a small amount and re-evaluating the objective function. By doing this for all dimensions, it is possible to evaluate the gradient for the entire parameter space. However, for a beamlet weight-based objective function the amount of time needed to numerically evaluate the gradient can be quite large.
Also, the proliferation of terms in the objective function as more partial volume points are added results in an increasingly complex objective function. However, limiting the partial volume points to a small number can degrade the directional conjugacy of the objective function, as the partial volume constraints correspond to components of the objective function that are often not orthogonal to each other.
U.S. Pat. No. 6,560,311 to Shepard et. al. (2003) also specifies an alternative embodiment using an objective function with partial volume constraints, based on formulae disclosed by Bortfeld et. al. 1997. There is no disclosure of an efficient method of evaluating the gradient of the enhanced objective function, and the iterative optimization algorithm does not make use of gradient information.
U.S. Pat. No. 6,038,283 to Carol et. al. (2000) uses a objective function based on the cumulative dose-volume histogram (CDVH). The method of formulating the CDVH, by division into piecewise-linear regions and then assigning weights for each region, appears to be equivalent to the partial volume constraint approach. The optimization method disclosed, simulated annealing, is a stochastic iterative optimization algorithm that provides no opportunity to take advantage of explicitly computed gradient information.
Direct Gradient Evaluation with Partial Volume Constraints
Hristov et. al. (2002) builds on the Starckschall method and demonstrates that, in theory, it is possible to directly evaluate the gradient of an objective function that incorporates partial volume constraints. To do this, they build up an analytical expression for the objective function using the Heaviside function (unit step function). The use of the Heaviside function follows from their expression of the partial volume constraints in terms of the cumulative dose-volume histogram (CDVH). This follows the practice of Starkschall and others, and is consistent with the clinical practice of using the cumulative forms of the DVH for evaluating plans.
There is a significant theoretical issue with using Heaviside functions in the construction of an objective function. The derivative of the Heaviside function is the Dirac mass, which is a generalized function that assumes a value of zero everywhere except at the origin (where its value is undefined). Because the objective function is intended to act as a map to guide the optimization algorithm, it is desirable for its gradient to yield useful information about the direction that the optimization should proceed. However, in the pure theoretical case, a gradient built up from Dirac masses is unable to point in any direction at all (as it is always zero everywhere except at a finite number of infinitesimal points, where it is undefined).
To address this problem Hristov et. al. uses a discrete approximation to the Dirac mass, the discrete impulse function. This provides some guidance to the optimizer, but it results in an objective function that is piece-wise linear. Such an objective function can be highly fragmented and suffers from serious directional conjugacy problems. This would only be exacerbated by the addition of more partial volume points.
PDF-Based Optimization
Lian et. al. (2003) pursues a conceptually distinct approach by proposing that the target DVH for a particular structure could be well-represented using probability density functions (PDFs). They recognize the value of formulating the objective function using the differential form of the DVH over the cumulative form. The advantage is that the differential DVH (when normalized) is essentially a PDF for dose distribution. This is more consistent with a statistical analysis approach, and provides benefits in the formulation of the objective function over the Heaviside-based objective function that Hristov et. al. employed.
Their approach begins with unimodal gaussian distributions as the target differential DVH (PDF), so that the corresponding target cumulative DVH was the error function shifted to the expected dose for the structure. This has advantages for both target and organ-at-risk, as in reality both types of structures never achieve perfect Heaviside-like distributions. By varying the sigma of the gaussian PDF, different effective slopes can be modeled, which is useful for representing the partial volume effects of an organ-at-risk provided that the set of corresponding partial-volume constraint points lie roughly on a sigmoidal curve.
Additionally, they explore more sophisticated target PDFs, with puzzling results. This included the peculiar asymmetrical under-dosing problem, which could have been due to problems in the method by which they were sampling their prior distribution. It is difficult to ascertain the underlying cause of this problem, as there is an absence of information disclosed about the exact form of the optimization employed.
It is not clear whether any attempt is made to explicitly incorporate gradient information, and no description is given as to how the gradient could be calculated given the PDF formulation. It appears that the approach utilizes a stochastic, Monte-Carlo-like optimization that randomly samples from the possible plan states and then computes the corresponding objective function value. This would probably suffer considerably from scalability problems (moving from 2D to 3D, for instance).