Statistical learning problems may be categorized as supervised or unsupervised. In supervised learning, the goal is to predict an output based on a number of input factors or variables (henceforth, referred to as variables), where a prediction rule is learned from a set of examples (referred to as training examples) each showing the output for a respective combination of variables. In unsupervised learning, the goal is to describe associations and patterns among a set of variables without the guidance of a specific output. An output may be predicted after the associations and patterns have been determined. These categories are illustrated in FIGS. 1A and 1B, which show data points as a function of weight 110 and height 112. In unsupervised learning 100 in FIG. 1A, the data may be described by input variables weight 110 and height 112 without any additional information (e.g., labels) that could help to find patterns in the data. Patterns in the data may be found by learning that there are two distinguished “clusters” of data points (represented by circles or decision boundaries 114 around them). Within each cluster, data in group A 116 or group B 118 are highly similar (i.e., close) and between clusters data are highly dissimilar (i.e., further away). When a new data point, i.e., combination of the input variables becomes available, it may be categorized as similar to and thus a potential member of one of the clusters that have been discovered, or as an outlier or as a member of a new cluster.
In supervised learning 130 in FIG. 1B, additional information about the data is available. The data points are labeled as Dutch 132 (white circles) or American 134 (filled-black circles). This extra information is exactly the output one wants to predict for future data. Having it available for the training data or examples allows predictive decision boundary 136 to be determined. In general, statistical learning involves finding a statistical model that explains the observed data that may be used to analyze new data, e.g., learning a weighted combination of numerical variables from labeled training data to predict a class or classification for a new combination of variables. Determining a model to predict quantitative outputs (continuous variables) is often referred to as regression. Determining a model to predict qualitative data (discrete categories, such as ‘yes’ or ‘no’) is often referred to as classification.
Developing models for statistical learning problems involving longitudinal data, in which a time series of observations are collected over a period of time, poses several challenges, including those associated with collecting the data efficiently and accurately. Analysis of the data may also be problematic, in particular, for a class of problems where variables associated with time-varying phenomena that have discrete events or epochs, each epoch having a characteristic onset time (henceforth referred to as a temporal onset), are sought. For example, in such problems there may be limited data and a plurality of potential variables to be screened. The analysis, therefore, may be underdetermined. In addition, the potential variables may not be independent from one another and/or samples of the potential variables may not have a corresponding probability distribution (for example, a normal distribution).
There is a need, therefore, for an analysis technique to address the challenges described above and to determine variables associated with time-varying phenomena having discrete epochs.