Marine mammals (sea mammals) include all mammals, which have readjusted to life in the sea, in particular also all types of whales and seals. In that, especially the whales are extremely threatened with extinction. Besides direct visual detection, such marine mammals can above all also be recognized by their thermal signatures, i.e. signatures generated by heat, like in particular emerging body parts, like fins and flukes, whale blow (body-temperature breathing air exhaled from the blowhole) and so-called “footprints”, i.e. turbulences of the water in the track of the animal). Due to increasing concerns in regards to the impacts of anthropogenic work generating underwater noise (like, e.g., ramming work for wind farms or hydro-acoustic explorations of the oil and gas industries) on marine mammals, e.g. for seismic-geophysical investigations within the EEZ (Exclusive Economic Zone) of the US and the UK, it is demanded to switch off seismic cannons (“air-guns”) in the presence of marine mammals in a radius of typically 1 km to 3 km, the so-called “mitigation radius”. Prior to using air-guns, a mammal-free observation period of 30 min is furthermore demanded. Whether marine mammals are present within the mitigation radius or are at risk of getting in there, currently is normally detected during the day by visual observations of up to three observers working simultaneously. During the night or with reduced visibility (visual range, reflections, lighting conditions) or weather conditions (wind, fog, rough seas), visual observation, however, cannot be realized. Even with sufficient visibility, visual observations across the entire horizon, which usually take place over longer periods of time, require highest concentration of the observers, since the thermal signature of the marine mammal to be detected is mostly only visible for a few seconds against the often very variable background of the waves. Therefore, due to fatigue setting in quickly, each observer can only be deployed for observation for a relatively short time. Therefore, there increasingly are attempts to use automatic systems with cameras and automatic analysis of the recorded images. In that, previous attempts have been mainly concentrating on thermographic methods, in which infrared cameras are used as image sources.
Hereinafter, first, a few basic definitions of terms used are listed, as they are familiar to the skilled person.
A classification, typification or systematics is a methodical collection of abstract classes (also concepts, types or categories) used for differentiation and organization. The individual classes are normally established by means of classification, i.e. by the division of objects on the basis of certain characteristics, and hierarchically arranged. The quantity of class names forms a controlled vocabulary. Applying a classification to an object by selecting a matching class of the given classification is called grading.
Verifying or verification is the proof that an assumed or asserted fact is true. The term is used differently, depending on whether in the establishment of the truth one only wants to rely on evidence put forward or also considers the confirming examination and certification of the fact, which is easier realizable in practice, by arguments of an independent authority as verification (compare Wikipedia; key word “verification”), which is given in the present case.
Monitored learning is a subarea of machine learning. In that, learning means the ability to reproduce principles. The results are known by laws of nature or expert knowledge and are used to train the system. A learning algorithm attempts to find a hypothesis, which makes predictions as unerring as possible. In that, hypothesis means an image, which allocates the assumed output value to each input value. For that, the algorithm changes the free parameters of the selected hypothesis class. The method depends on an output to be learned, which is determined in advance, the results of which are known. The results of the learning process can be compared with the known, correct results, i.e. “monitored”. Following training or a learning process, respectively, the system should be able to deliver a correct output for an unknown input similar to the learnt examples. In order to test these abilities, the system is validated. One possibility is to subdivide the available data into a training set and a test set. The objective is to minimize the error measure in the test set, which is not used for training. Cross-validation methods are frequently applied for that.
A support vector machine (SVM) is a classifier and subdivides a quantity of objects into classes, such that around the class limits an area remains free of objects, which area is as wide as possible. The support vector method is a purely mathematical method of pattern detection, which is implemented in computer programs. Starting point for the construction of a support vector machine is a quantity of training objects (training datasets), for which it is respectively known, which class they belong to. Each object (each dataset) is represented by a vector in a vector space. It is the task of the support vector machine to fit a hyper level into this space, which acts as a division level and divides the training objects into two classes. In that, the distance of those vectors closest to the hyper level is maximized. Later, this wide, empty margin is to provide that also objects, which do not exactly match the training objects, are classified as reliable as possible. Upon using the hyper level, it is not necessary to observe all training vectors. Vectors located further away from the hyper level and kind of “hidden” behind a front of other vectors, do not influence the location and position of the division level. The hyper level only depends on the vectors closest to it—and only those are required to describe the level in a mathematically exact fashion. These closest vectors are called support vectors, according to their function, and gave the support vector machines their name. A hyper level cannot be “bent”, so that a clean separation with a hyper level is only possible, when the objects can be linearly separated. This generally is not the case in real applications. In case of data, which cannot be linearly separated, support vector machines use the kernel trick in order to insert a non-linear class limit. The idea behind the kernel trick is to transition the vector space, and thus also the training vectors located therein, into a higher-dimensional space. In a room with a sufficiently high number of dimensions—in case of doubt indefinite—even the most nested vector quantity can be linearly separated. In this higher-dimensional space, the separating hyper level is now determined. Upon retransformation into the lower-dimensional space, the linear hyper level becomes a non-linear, maybe even noncontiguous hyper area, which cleanly separates the training vectors into two classes (compare C.-C. Chang and C.-J. Lin: “LIBSVM: A Library for Support Vector Machines”, Download citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.66.2871; Source csie.ntu.edu.tw/˜cjlin/papers/libsvm.pdf; Archive CiteSeerX—Scientific Literature Digital Library and Search Engine (United States)).
The principal component analysis (PCA) is especially used in image processing and is a method of multivariate statistics. It serves structuring, simplifying and illustrating comprehensive datasets by approximating a multitude of statistical variables by a lower number of linear combinations as significant as possible (the “principal components”). The underlying dataset typically has the structure of a matrix of objects and their characteristics. Such a dataset can be illustrated as a quantity of n points in the p-dimensional space. It is the objective of the principal component analysis to project these data points into a q-dimensional subspace such that in that, as little information as possible is lost and present redundancy is summarized in the data points in the form of correlation. Mathematically, a principal axis transformation is performed. The principal component analysis is problem-dependent, because for each dataset, an independent transformation matrix must be calculated. The rotation of the coordinate system is executed such that the covariance matrix is diagonalized, i.e. the data are decorrelated (the correlations are the non-diagonal entries of the covariance matrix). For normally distributed datasets, this means that following PCA, the individual components of each dataset are statistically independent of one another, since the normal distribution is completely characterized by the zeroth (standardization), first (average value) and second moment (covariances). If the datasets are not normally distributed, then even after the PCA—although decorrelated now—the data will still be statistically dependent. Thus, in particular for normally distributed datasets, the PCA is an optimal method.
For the limit value method (first break picking), data are selected in a targeted fashion, which lie above or below a predetermined limit value (threshold value). For the STA/LTA quotient method, a quotient is formed from different average values of a parameter significant for the respective problem, and related to a specified, time- and case-dependent variable limit value, where applicable. Application of the limit value algorithm is in particular known from seismology (compare “Automatic time-picking of first arrivals on noisy microseismic data” of J. Wong et al., Conference Abstract Canadian Society of Exploration Geophysicists CSEG Microseismic Workshop 2009; “Automatic time-picking of Microseismic Data Combining STA/LTA and the Stationary Discrete Wavelet Transform” of I. R. Rodriguez, 2011 CSPG CSEG CWLS Convention).
The thesis “Hast Du's auf'm Schirm? Entwurf und Implementierung eines stabilen multifunktionalen Mehrkamera-Outdoor—Echtzeitsystems zur automatischen Objektdetektion im Infrarotbereich” (Marc Ritter, Mar. 19, 2007, Chemnitz University of Applied Sciences, thesis cited in “Entwurf eines Echtzeit-Bildverarbeitunssystems zur automatischen Erkennung von Walen im Umfeld der Antarktis” by Marc Ritter in “15 Jahre Künstliche Intelligenz an der TU Chemnitz” CSR-08-01, April 2008, pages 231 to 250), was initiated and supervised by the inventors of the present invention.
For the scan of the water surface, in the thesis, one visual camera with an acquisition angle of 24° and two infrared cameras with an acquisition angle of 12° or 7°, respectively, were used as infrared camera system in a stationary fashion (compare page 9, FIG. 1.5 of the thesis). The principal modular pipeline processing is shown on page 62, FIG. 5.1 and the associated module hierarchy on page 47, FIG. 4.2 of the thesis. In the entire processing, the image pre-processing represents an integral component of the entire detection process already (compare page 24, FIG. 2.2 of the thesis). In image pre-processing, the image data are stored in a ring buffer according to the FIFO principle (first in-first out) (compare page 53, Chapter 4.3.1. of the thesis). For image segmentation, the image is subdivided into weighted image parts, wherein, on the one hand, homogenous segmentation objects with similar characteristics, and on the other hand, object limits are found. With such weighted segmentation, however, relevant signals may be lost already. In image pre-processing already, using several. Gauss filters and empirically determined weighting values, it is attempted to improve the signal-to-noise ratio (compare page 68, Table 5.1 of the thesis). By filtering, however, relevant signals may likewise be lost. With the empirical determination of factors, data required for later classification may additionally be ignored. In detection, a Sobel filter is applied twice to the signal-enhanced image. In this, only the points of highest intensity (i.e. of highest edge steepness) are considered and used for classification. The local contrast remains unconsidered. Classification consists in an examination, whether five times (likewise a purely empirically determined value) in a row the highest edge steepness was detected at the same point in the image. Optimization in terms of an adjustment to current boundary conditions (changed environmental conditions) is not undertaken.
Verification, localization and documentation are not principal components of the known processing method. They are referred to only marginally in the thesis. For verification, it is only noted that stored data can be retrieved again via a playback function (compare page 52, Chapter 4.2.3 of the thesis). For localization in terms of location identification, global position data (GPS) can be integrated (compare page 54, Chapter 4.3.2 of the thesis). For documentation in terms of making information usable for further use, storage on durable data carriers (compare page 77 of the thesis, center) and chronological listing on a website (compare page 80 of the thesis, top) are mentioned.
Furthermore, from “MAPS: an Automated Whale Detection System for Mitigation Purposes” (of D. P. Zitterbart et al., SEG (Society of Exploration Geophysicists) Expanded Abstracts 30, 67 (2011) to International Exhibition and 81st Annual Meeting, San Antonio, USA, Sep. 18-23, 2011; initial publication on the Internet; doi:10.1190/1.3628169), individual aspects are known of the system comprehensively described in this patent application for the first time, which system uses an infrared camera (FIRST-Navy) for whale observation, which is attached to the mast of a research vessel. The infrared camera can observe a virtually full circle (300°) around the vessel, generates grayscale images and is gyroscopically compensated against vessel movements. Furthermore, a graphical user interface (Tashtego desktop, wherein Tashtego is a software developed by the Alfred Wegener Institute) is shown, which displays the current video and the ten video sequences recorded prior to that. Likewise, an enlarged image section and a recording loop of the last detected whale are displayed. In the current image, orientation lines for the horizon and various radii are displayed. Furthermore, images are shown with integrated zooms, distance data and water temperatures and with vessel-as well as geo-referenced cartographic recordings of detected whale blows. Furthermore, it is stated that processing comprises detection, verification, localization and documentation. About the actual processing of the image data, however, there are no statements exceeding those from the above-cited thesis of Marc Ritter.
Furthermore, from US 2010/0046326 A1, a method for detecting whales is known, which, however, is based on an acoustic principle with sounds produced by the whale and other-generated sounds reflected by the whale.
Furthermore, airplane-aided infrared cameras were used, in order to undertake a census of Antarctic whale populations from the air. In the publication “infrared whale counting” (Keith Dartez, retrievable on the Internet at infraredinnature.blogspot.com/ it is described that the footprint of the whales can be detected in the thermographic image with a completely calm sea. However, no automatism for detection of these signatures is described, just as airplane-aided observation for the case of mitigation of noise-generating anthropogenic work described there cannot be implemented logistically, since airplanes cannot monitor the surroundings of a vessel or a platform uninterruptedly for several months.