Computer database software stores data records in tables. A database is a set of tables of data along with information about relations between these tables. Tables represent relations over the data and consist of one or more fields (sometimes called columns). A set of records make up a table in a database. The contents of a database are extracted and/or manipulated using a query language that is supported by the database software. Current query languages support the extraction of information that is well-specified (e.g. via a SQL query or a logical specification of a subset of the data contained in the database). To retrieve data from the database, one must specify an exact description of the desired target data set.
One important use of database technology is to help individuals and organizations make decisions based on the data contained in the database. Decision support information varies from the well-specified (e.g. give me a report of sales by store over an exactly specified period of time) to the not-so-well specified (e.g. find me records in the data that are "similar" to records in table A but "dissimilar" from records in table B). In this case, the target set is not specified via an exact query, but is specified implicitly by labeling (or tagging) records in the data (e.g. these records are to be tagged `A` and these others are to be tagged `B`). This is an example of a classification task. Classification tasks are one of the activities used in the area of decision support, data analysis, and data visualization. Given an existing database containing records, one is interested in predicting the values of some target fields (also called variables) based on the values of other fields (variables) in the database.
As an example, a marketing company may want to decide whom to target for an ad campaign based on historical data about a set of customers and how they responded to previous ad campaigns. In this case, there is one field being predicted: the field in the database that indicates whether a customer responded to a previous campaign--call it "response field". The fields used to predict the response field (and hence to classify records) are other fields in the database. For example: age of customer, whether customer owns a vehicle, presence of children in household, and so forth. Other examples where classification over a database is useful include fraud detection, credit approval, diagnosis of system problems, diagnosis of manufacturing problems, recognition of signatures in signals or object recognition in image analysis.