The performance of a brain computer interface (BCI) can be optimized by considering the simultaneous adaptation of both a human and machine learner. Preferably, adaptation of both learners occur on-line and in (near) real-time. The human and machine learners are assumed to process data sequentially, with the human learner gating the response of the machine learner. The gating by the human learner captures the dynamic switching between task-dependent strategies, while the machine learner constructs the mappings between brain signals and control signal for a given strategy (or set of strategies). The human and machine co-learn in that they adapt simultaneously to minimize an error metric, or equivalently, maximize a bit rate.
In a typical BCI system, signal acquisition from a human learner, or subject, is typically through one or more modalities (electroencephalography (EEG), magnetoencephalography (MEG), chronic electrode arrays, etc.). A key element of a BCI system is a machine learning or pattern recognition module to interpret the measured brain activity and map it to a set of control signals or, equivalently, a representation for communication, e.g., a visual display.
In addition to the machine learner, the human learner is integral to a BCI system. Adaptation of the human learner is often implicit, for example humans will switch strategies (e.g. think left/right versus up/down) based on their perceived performance. This dynamic switching by the human learner can make adaptation of the machine learner challenging, particularly since this can be viewed as making the input to the machine learner more non-stationary. Since the overall challenge in BCI is to maximize performance of the combined human-machine system (i.e., minimize error rate or conversely maximize bit rate) an approach is required which jointly optimizes the two learners.
Conventional analysis of brain activity using EEG and MEG sensors often relies on averaging over multiple trials to extract statistically relevant differences between two or more experimental conditions. Trial averaging is often used in brain imaging to mitigate low signal-to-interference (SIR) ratios. For example, it is the basis for analysis of event-related potentials (ERPs) as explained in Coles M. G. H. et al., “Event-related brain potentials: An introduction,” Electrophysiology of Mind. Oxford: Oxford University Press (1995). However, for some encephalographic applications, such as seizure prediction, trial averaging is problematic. One application where the problem of single-trial averaging is immediately apparent is the brain computer interface (BCI), i.e., interpreting brain activity for real-time communication. In the simplest case, where one wishes to communicate a binary decision, averaging corresponds to asking the same question over multiple trials and averaging the subject's binary responses. In order to obtain high-bandwidth communication, it is desirable to do as little averaging over time or across trials as possible.
More generally, single-trial analysis of brain activity is important in order to uncover the origin of response variability, for instance, in analysis of error-related negativity (ERN). The ERN is a negative deflection in the EEG following perceived incorrect responses (Gehring, W. J. et al., “A neural system for error detection and compensation,” Psychological Science, 4(6):385-390 (1993); Falkenstein, M. et al., “ERP components on reaction errors and their functional significance: A tutorial,” Biological Psychology, 51:87-107, (2000) or expected losses (Gehring, W. J. et al., “The medical frontal cortex and the rapid processing of monetary gains and loss,” Science, 295: 2279-2282 (2002)) in a forced-choice task. Single-trial detection of the ERN has been proposed as a means of correcting communication errors in a BCI system (Schalk et al., “EEG-based communication: presence of an error potential,” Clinical Neurophysiology, 111:2138-2144, (2000)). With the ability to analyze the precise timing and amplitude of the ERN, on individual trials, one can begin to study parameters that cannot be controlled across trial, such as reaction time or error perception. Such an approach opens up new possibilities for studying the behavioral relevance and neurological origin of the ERN.
With the large number of sensors on a single subject in high-density EEG and magnetoencephalography (G), e.g., 32 or more sensors, an alternative approach to trial averaging is to integrate information over space rather than across trials. A number of methods along these lines have been proposed. Blind source separation analyzes the multivariate statistics of the sensor data to identify spatial linear combinations that are statistically independent over time (Makeig et al., “Independent component analysis of electroencephalographic data,” Advances in Neural Information Processing Systems, 8: 145-151, MIT Press (1996); Vigario et al., “Independent component approach to the analysis of EEG and MEG recordings,” IEEE Transactions on Biomedical Engineering, 47(5): 589-593 (2000); Tang et al., “Localization of Independent Components of Magnetoencephalography in Cognitive Tasks,” Neural Computation, Neural Comput. 14(8): 1827-1858 (2002)). Separating independent signals and removing noise sources and artifacts increases SIR. However, blind source separation does not exploit the timing information of external events that is often available. In most current experimental paradigms subjects are prompted with external stimuli to which they are asked to respond. The timing of the stimuli, as well as the timing of overt responses, is therefore available, but is generally not exploited by the analysis method.
In the context of a BCI system, many methods have applied linear and nonlinear classification to a set of features extracted from the EEG. For example, adaptive autoregressive models have been used to extract features across a limited number of electrodes, with features combined using either linear or nonlinear classifiers to identify the activity from the time course of individual sensors (Pfurtscheller, G. et al., “Motor imagery and direct brain-computer communication,” Proceedings of the IEEE, 89(7):1123-1134, (2001)). Others have proposed to combine sensors in space by computing maximum and minimum eigenvalues of the sensor covariance matrices. The eigenvalues, which capture the power variations of synchronization and desynchronization, are then combined nonlinearly to obtain binary classification (Ramoser et al., “Optimal spatial filtering of single trial EEG during imagined hand movement,” IEEE Transaction on Rehabilitation Engineering, 8(4):441-446 (2000)). Spatial filtering has also been used to improve the signal-to-noise ratio (SNR) of oscillatory activity. However, there has been no systematic effort to choose optimal spatial filters. In the context of the ERN, Gehring et al. (1993) use linear discrimination to identify characteristic time courses in individual electrodes, but do not exploit spatial information. Although many of these aforementioned methods obtain promising performance in terms of classifying covert (purely mental) processes, their neurological interpretation remains obscured.