In data processing, feature extraction is a special form of dimensionality reduction. When the input data to a process (algorithm) is too large to be processed and it includes some redundancy, the input data may be transformed into a reduced representation set of features, i.e., a features vector. If done properly, the features set includes the relevant information from the input data to perform the desired task using the reduced representation by the feature vectors. Feature extraction techniques simplify the amount of data required to describe a large set of data accurately.
Feature extraction has been widely used in image processing and object detection and recognition which use different algorithms to detect and isolate various features of a dataset, for example, digitized image or video, and detect object(s) within those images or videos. Object recognition is the task of finding and identifying objects in an image or video sequence. Many approaches to the task have been implemented over past several years, such as edge detection, or recognition by parts. In a typical feature-based approach, such as edge detection, a search is used to find feasible matches between object features and image features. The primary constraint of these approaches is that a single position of the object must account for all of the feasible matches. These methods then extract features from the objects to be recognized and the images to be searched.
Classification is a method of identifying to which of a set of categories (classes) a new observation belongs, on the basis of a training set of data containing observations (or instances) for which their category is already known. The individual observations are analyzed into a set of quantifiable properties, known as features.
A classifier is a classification algorithm. In the context of machine learning, classification is considered an instance of supervised learning, for example, learning where a training set of correctly identified observations (features) is available. The corresponding unsupervised procedure is known as clustering, which comprises of grouping data into categories based on some measure of inherent similarity. For example, the distance between instances, considered as vectors in a multi-dimensional vector space. In machine learning, the observations are often known as instances, the explanatory variables are termed features and the possible categories to be predicted are classes.
A common subclass of classification is probabilistic classification. Algorithms of this nature use statistical inference to find the best class for a given instance. Unlike other algorithms, which simply output a best class, probabilistic algorithms output a probability of the instance being a member of each of the possible classes. Typically, the best class is then selected as the one with the highest probability.
Classification and clustering are examples of the more general problem of pattern recognition, which is the assignment of an output value to a given input value. Other examples are regression, which assigns a real-valued output to each input; or sequence labeling, which assigns a class to each member of a sequence of values, for example, part of speech tagging, which assigns a part of speech to each word in an input sentence.
A training set is a set of data used to discover potentially predictive relationships. In machine learning field, a training set includes an input vector and an answer vector, which are used together with a supervised learning method to train a knowledge database. In statistical modeling, a training set is used to fit a model that can be used to predict a “response value” from one or more “predictors.” The fitting can include both variable selection and parameter estimation. Statistical models used for prediction are often called regression models, of which linear regression and logistic regression are two examples.
Most of the current Ocean Ship Detection (OSD) screening algorithms are designed to operate in open ocean and use a land mask to eliminate the parts of the scene that are on or near land. These screeners examine the entire image, except for the land mask derived from Digital Terrain Elevation Data (DTED), and find anomalies in the water that could be ships. The screening algorithm is designed to find candidates of all sizes and to operate in all kinds of weather and sea conditions. It uses simple features whose thresholds are designed to provide a very high detection rate (at the expense of a high false alarm rate).
However, these screening algorithms typically have many false alarms due to, for example, small clouds, white caps in the sea and surf lines. Some of the current OSD ship screening algorithm produces almost 10 false alarms for every real ship it detects. Depending on the mission, the impact of the high OSD false alarm rate ranges from mildly annoying to a complete showstopper. This greatly reduces the analyst workload and allows OSD to be useful in applications where small false alarm rates are required.
Accordingly, there is a need for a ship/boat detection technique that is capable of rejecting the false alarms due to the surroundings of the ships/boats and is less complex and computationally intensive and provides high accuracy for the detection.