Features represent the characteristics of objects, and selecting or synthesizing composite features are the key to object recognition.
Working with an appropriate set of features is crucial for the success of machine-learning, artificial intelligence, and data mining algorithms/processes. For the convenience of the reader, the term “machine-learning” will be used hereinafter and should be understood to encompass machine-learning as well as artificial intelligence and data mining. Typically, obtaining such an appropriate set of features involves three steps, features extraction, features generation, and features selection.
Features extraction is used when there is a too vast amount of raw data for the machine-learning algorithm to operate on. Therefore, in this step, data is compressed to a subset of features. For example, in a telecommunication field of technology, raw data that is used may comprise all the Call Detailed Records (CDRs) available for the telecom operator, from which it is possible to extract features such as the number of phone calls that were made by a subscriber within a period of time (e.g. within the last month), or the total number of minutes that the subscriber used his telephone device during the last week for voice calls.
The step of features generation is in fact a process of generating new features by applying functions on existing or extracted features, in order to generate new dependent features. Let us revert to the previous telecommunication example, one may generate a new feature the average duration of a call of a subscriber during the period of last month, by using two extracted features, the number of calls the subscriber made, and the duration of these calls.
Features selection is a process of selecting a subset from all the extracted and generated features for use in the machine-learning process of building a model or predictor. The process of features selection enables disposing redundant or irrelevant features, which may cause undesired phenomena when using machine-learning algorithms for constructing the model. Moreover, many machine-learning techniques have limitations (e.g., due to complexity) on the amount of features they can handle effectively. Features selection permits reducing the amount of features to a volume manageable by the machine-learning algorithm. It should be noted that since redundancy considerations are a key aspect in the process of features selection, features are selected while considering which other features are selected; therefore the best practice is to carry out the features selection process after completing the phases of features extraction and features generation.
It is important to denote that even though many machine-learning algorithms practically require the use of relatively small sets of features, creating a rich universe of features by features extraction and generation and then using features selection to pick a preferred sub-set is very important factor for the machine-learning process to be successful.
In many domains it is not clear which features will be the most beneficial ones. Therefore, it is desired to extract and generate a very large set of features, which will be pruned at the selection stage. However in many domains, for example domains that include temporal relations between entities and large amounts of data, generating the multitude features by using the existing state of art methods, is impractical. For example, let us assume that one has a set of 100 billion CDRs corresponding to call records of 30 Million subscribers over a period of one year, and this information is to be used by applying machine-learning techniques to identify families among the subscribers. Without a priory information regarding which features might be important for building the desired model, it would be preferred to extract for each subscriber a rich set of features. Such set of features may be for example, what is the subscriber's average number of calls, what is his average number of calls on Saturdays between 8 and 10 AM, who are the 3 subscribers he called most during last month, who are the 3 subscribers with whom he spoke the highest number of minutes on Sundays between 4-6 PM over the last year, which is the location from which the subscribers made most of the calls last week, etc. Hundreds or even thousands of such features would be extracted and then be later used for features generation and features selection.
Given the volume of data, its complexity (temporal and link relations) and the number of features, the straightforward approach of “running a query” per feature, is simply impractical.
Therefore, a solution is required to overcome the problem of extracting large amounts of features, by carrying out effective features synthesis processes thereon.