Machine learning may be applied to automatically generate an algorithm that is improved through experience. Applications of machine learning range from data mining programs that discover general rules in large data sets, to information filtering systems that automatically learn users' interests. The algorithm that is automatically generated and updated is often referred to as a model. Many machine learning techniques are known to produce better models by discretizing continuous variables.
Discretization refers to the process of partitioning a data set pertaining to a continuous variable into intervals, which may be referred to as bins. Typically, data is discretized into bins of equal width, where the width corresponds to a range of possible data values. However, since the data values and distribution of the data values are unpredictable, data values are often distributed unequally across the bins. If a large proportion of the data fall into a small number of bins then discriminatory power is lost. Conversely, if too little data falls into one or more bins then the ability of the model to generalize to previously unseen inputs is compromised. In either case, the model generated from the binned data is likely to be sub-optimal.
A common approach to distribute data values equally across the bins is to perform the binning operation offline. As a result, the binning and resulting model will typically not reflect the most recently received and most valuable data. As a result, decisions generated from the model may be based upon data that is outdated or irrelevant to the decision making process.