The invention relates generally to closed loop control systems, and more particular to a machine learning controller for medical devices exhibiting automatic optimization of cardiac pacemakers and ICD devices stimulations using a machine learning scheme.
Implanted pacemakers and intracardiac cardioverter defibrillators (ICD) deliver therapy to patients suffering from various heart-diseases. Congestive heart failure (CHF) is defined generally as the inability of the heart to deliver enough blood to meet the metabolic demand. Often CHF is caused by electrical conduction defects. The overall result is a reduced blood stroke volume from the left side of the heart. Since it is known that cardiac output depends strongly on the left heart contraction in synchrony with the right heart, as described inter alia in U.S. Pat. No. 6,223,079, issued Apr. 24, 2001 to Bakels et al, entitled “Bi-Ventricular Pacing Method”, the entire contents of which is incorporated herein by reference, CHF patients are often implanted with a bi-ventricular pacemaker with electrodes in 3 chambers. The bi-ventricular pacemaker is arranged to re-synchronize the left heart contraction to the right heart contraction, resulting in an effective therapy. The resynchronization task demands exact pacing management, primarily focused on accurate timing, of the heart chambers such that the overall stroke volume is maximized for a given heart rate (HR), where it is known that the key point is to bring the left ventricle to contract in synchrony with the right ventricle. The re-synchronization task is patient and activity dependent, and thus for each patient the best combination of pacing time intervals which results in restored synchrony varies during the normal daily activities of the patient.
The positioning of the implanted leads in the right and left ventricles are another important contributor to the success of CRT devices and in World Intellectual Property Organization Patent Publication WO2006/0016822, published to ROM on Jun. 15, 2006 entitled “Optimizing and Monitoring Cardiac Resynchronization Therapy Devices”, the entire contents of which are incorporated herein by reference, a method to find and validate optimal lead positioning in implantation based on the adaptive CRT control system is described.
Q-learning (QL) is a reinforcement learning technique that works by learning an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy thereafter. One of the strengths of Q-learning is that it is able to compare the expected utility of the available actions without requiring a model of the environment. Watkins and Dayan in an article entitled “Q Learning”, published 1992 in Machine Learning 8, 279-292, 1992, showed that online solution of a QL recursive formula is guaranteed to converge to the optimal policy in a model free reinforcement learning problem.
D. Odonnell et al reported in “Long-Term Variations in Optimal Programming of Cardiac Resynchronization Devices”, published in PACE 28; Jan. 2005; 24-26, the results of a clinical study with 40 CHF patients. The authors found that the optimal atrio-ventricular (AV) delay and inter-ventricular (VV) interval, obtained using echocardiography, varied significantly during 9 months of patient follow-ups. The authors explained the results by a slow and gradient improvement in the cardiac function due to implanted cardiac resynchronization therapy (CRT) devices that generated a reverse remodeling of the left ventricle.
Whinnett et al in “Haemodynamic Effects of Changes in AV and VV Delay in Cardiac Resynchronization Therapy Show a Consistent Pattern: Analysis of Shape, Magnitude and Relative Importance of AV and VV delay”, published online in Heart, 18 May 2006, doi:10.1136/hrt.2005.080721″, has studied the dependence on AV and VV timings in CRT patients using non-invasive systolic blood pressure (SBP) measurements. The authors propose a Gaussian fit for the measured SBP as a function of AV and VV such that the maximum SBP value is at the optimal AV and VV delay timings specific for each CRT patient. Whinnett et al reports that in higher heart rates, produced with higher rate atrial pacing, the response to variations in AV and VV timings is more significant and hence the Whinnett Gaussian fit surface is usually obtained with higher heart rates.
Whinnett et al in “The Atrioventricular Delay of Cardiac Resynchronization Can be Optimized Hemodynamically During Exercise and Predicted from Resting Measurements” published 2008 in Heart Rhythm, Vol. 5, pages 378-386, showed that CRT patients are more symptomatic at high heart rates and hence it is important to optimize the CRT device both at rest condition and at higher heart rates. The authors further propose a method to calculate optimal AV delay and VV interval timing during exercise using measurements in rest mode, normal sinus rhythm and atrial pacing with higher rates.
The proposal by Whinnett is a Guassian fit obtained offline with post processing averaging of the measured SBP data after calculating an average over several cardiac cycles before a pacing configuration change is made. The actual SBP data obtained is noisy and does not easily show the underlying surface.
Michel Zuber et al in “Atrioventricular and Interventricular Delay in BiVenricular Pacing”, published January 2008 on-line on behalf of the European Society of Cardiology, using an external non-invasive heart sound method, show the dependence on AV and VV timing in rest condition and obtained more complex contours with many local maxima and no pronounced global maxima in CRT patients.
Although the Whinnett proposal described above is not a perfect Gaussian fit, and differs from patient to patient and differs for individual patients based on heart rate, nonetheless they show the hemodynamic response to the two major CRT timings parameters, the AV delay and VV intervals, and thus remain a useful approach to characterize a CRT patient and to accordingly optimize a CRT device.
However several serious problems are described by Whinnett and Zuber that prevent the production of an easy algorithm to find the optimal AV delay and VV interval per CRT patient. In particular, Whinnett teaches us that the Gaussian fit is hard to obtain with low heart rate, i.e. in rest condition, and Zuber indicates that such as fit may not have a clear global maximum at all. Whinnett teaches that AV and VV optimization is especially important at higher heart rates where CRT patients are more symptomatic and was able to show that at higher heart rate a surface formed from the AV and VV parameters is more easily fitted with a Gaussian and global maxima. However the optimal values at high heart rates are different from those in rest heart rates and since the Gaussian fit at the resting heart rate has no global maxima, the proposal by Whinnett to use the rest condition delays with a correction term for higher rate is not straight forward.
World Intellectual Property Organization Patent Publication WO2005/007075, published to ROM on Jan. 27, 2005 and entitled “Adaptive Resynchronization Therapy System”, the entire contents of which are incorporated herein by reference, is addressed to an adaptive CRT device in which the AV delays and the VV intervals are changed online by the implanted device, which act to perform dynamic optimization of the AV delay and the VV intervals. The adaptive CRT device utilizes hemodynamic sensor feedback online and converges to the optimal values using a spiking neurons network and a trial and error gradient ascent algorithm. However, the described adaptive CRT system does not guarantee convergence to optimal pacing therapy.
There is therefore a long felt need to develop a systematic closed loop control system in which the therapeutic stimulation parameters are automatically adjusted so as to deliver safe and optimal performance stimulation therapy. More specifically there is a need to develop a method for online dynamic optimization of the AV delay and VV interval for cardiac resynchronization that will converge to the optimal stimulation timings automatically with any CRT surface that may have in addition to a global optimum also smaller local maxima and the method preferably should deliver optimal therapy at both high heart rates, where CRT patient are more symptomatic, and at lower at rest heart rates.