1. Field of the Invention
The present invention relates to providing an implementation of Non-negative Matrix Factorization functionality integrated into a relational database management system
2. Description of the Related Art
Traditionally, as part of standard numerical analysis, matrix factorization is a common preprocessing procedure performed prior to solving a linear system of equations. For data mining, matrix factorization offers a way to reduce the dimensionality of a dataset and extract features that reveal interesting structure in the data or provide inputs to further types of analysis. In matrix factorization, the number of the dataset independent columns is reduced by projection onto a lower dimensional space (e.g. smaller matrices).
This type of rank reduction by factorization can reveal interesting low-dimensional subspaces embedded in large dimensionality datasets space and is a useful operation for pattern discovery and feature extraction. For example, the traditional Principal Component Analysis (PCA) uses a projection of the data on dimensions along which it varies the most and can be used to visualize the most dominant structure in a dataset.
Non-negative matrix factorization (NMF) involves imposing non-negativity constraints on the factors. NMF has been shown to be a useful decomposition and feature extraction method in fields such as object detection and recognition, and to be a valuable alternative to PCA. By forcing a dataset (matrix) to “fit” into a product of smaller datasets (matrices) NMF compresses the data and tends to eliminate some of the redundancies and expose the most common patterns. By using a parts-based or component-based decomposition, and in contrast to PCA and other techniques, the compressed version of the data is more interpretable and can be used to understand interesting patterns and common trends in the dataset. The NMF decomposition also induces a numerical taxonomy that can be used for grouping the rows or columns of the original dataset. The extracted features can be used as inputs to other analysis tasks such as classification or indexing. This procedure has proven useful in face recognition problems and in the discovery of semantic features in texts.
However, there are some limitations on traditional NMF techniques. For example, NMF has traditionally been applied to “flat” or non-relational datasets. This limits the analysis that may easily be performed with NMF. Conventional system require the extraction of data from the database into a statistical package where processing could be performed. This process is complex and not likely to be attempted by the user. This process is also relatively expensive and time consuming to perform. In addition, traditional NMF techniques do not handle “sparse” datasets well and are not applicable to categorical data. This prevents traditional NMF analysis from being efficiently applied to particular types of data, such as textual data. A need arises for a technique by which NMF may be applied to relational datasets, to sparse datasets, and categorical data.