One objective of model fitting is to tune the parameters of a model in order to fit some existing data, such as a training set, to either perform regression or classification, and to predict the function value at inputs not present in the training set. Model fitting is a crucial and often very time consuming component of machine-learning and forecasting algorithms. Example applications of model fitting may include image classification such as where the model is fitted to label a set of pictures based on an already labeled subset of the images. In this case, the application may learn to detect features and use the detected features to identify whether a picture belongs to a class. In general this has several practical applications, such as handwriting recognition, automatic labeling for search, filtering unwanted results, etc.
Another example application of model fitting may include natural language processing. In this example, classifying sound samples may be used to generate automated subtitles, translations, label sound/music files for search or filter them. Speech recognition can also be used to control devices.
A further example application of model fitting may include spam filtering in texts. For example, a model identifying spam messages may need to be tuned in order to classify new messages for automatic spam filtering.
Still another example application of model fitting may include tracking advertisements by click through rates in order to create predictions for online serving of advertisements.
Another example application of model fitting may include web traffic forecasting such as that used to estimate the amount of traffic on a website given a set of circumstances. This can be used for better resource allocation and also for inventory management of advertisement.
Yet another example application of model fitting may include product recommendation systems including those that provide suggested media, search information, or advertisements to users based on a browsing or purchase history
The above examples are only a small selection of where model-fitting is widely used. In addition supervised training for classification, model-fitting may also be used for unsupervised learning, where the task is to learn a sparse representation of the identity function on the training set. Some model fitting techniques may also utilize additional noise injection to create more robust models. Unsupervised learning may be useful if the number of unlabeled examples greatly exceeds that of the labeled examples. Supervised learning may typically be used as pre-training before a subsequent supervised learning phase. This can both improve the training speed and the generalization error.
Often the whole training set is too large to evaluate the objective function using the whole training set. Sometimes it is even infinite, as training examples may be generated on the fly. If it is not technically feasible to control the selection of the training examples, then the problem may be referred to as online training.
Most model fitting applications are based on sampling. Already the use of a training set can be regarded as a form of sampling. The most primitive form of sampling is the use of stochastic gradient without batching. This algorithm loops over all examples one-by-one and updates the objective immediately by adding a small correction based on the example to the model parameters. This method is one of the cornerstones of all machine learning applications. An improved version of stochastic gradient is using mini-batching, a randomly chosen subset of training examples is considered and the gradient is computed using the subset rather than a single individual training example. In the case of gradient descent, which is a first order method, this results in a modest improvement mostly due to improved memory management as the gradients would be added up and averaged over the long run.
Higher order methods, such as pseudo-Newton methods, cannot operate on single instances without a significant loss of performance. Their strength is to approximate the objective by a quadratic form and minimize this approximation subsequently. A prominent example of second order, or quasi-Newton, optimization in machine learning is the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method which manages a limited second order approximation of the model during optimization.
In the case of quasi-Newton methods, batching affects the performance of the algorithms profoundly. Theoretically, second order methods would require a test set that give a good approximation of the Hessian of the objective function. In praxis, however there is a tradeoff between accuracy of the estimation and computational costs. Better approximation of the Hessian can be obtained by increasing sample sizes, but the effect of a more accurate fit will not improve the performance significantly in the beginning of the optimization. As the optimization procedure progresses, larger and larger sample sizes may be required to get optimal learning performance.
Second order methods require larger batch sizes in general, and the choice of a reasonable batch size can affect the overall performance of the learning process drastically. However the optimum batch size depends on a various factors and its value changes considerably as training proceeds, so there is not one-fit-for-all solution.
Typical strategies for model fitting may involve a constant batch size or changing the batch size by some prescribed function which is empirically determined and tuned to the specific problem. A poorly tuned batch size (or batch size selection mechanism) may result in oscillating behaviors and inferior quality solutions. In addition, an ad hoc batch-sizing function may be well tuned for a specific setting, but the optimal sampling depends on the features of the model employed and as well as on the underlying optimization method. If either the algorithm or the training set changes significantly, then previously close-to optimal sampling may not converge for the changed situation. These strategies may also result in software-engineering issues such as hand-tuned, hard coded parameters that can seriously affect the reusability, refactorability and flexibility of the implementation.