1. Fields of the Invention
The present invention relates to systems and methods for estimating a frequency of rarely occurring events. More particularly, the present invention relates to estimation a magnitude of rare events upon receiving a complete data sample representing past events over a time period and a specific exceedance probability.
2. Description of the Prior Art
There has been a need to estimate, on a basis of a set of observations of some physical quantities, the sizes of events or events that will be exceeded with an extremely small probability. An extremely small probability might, for example, refer to a probability less than 0.001. Rare events would then refer to events whose occurrence probabilities are less than 0.001 in a particular time period. An example of a rare event may be a wind speed that is exceeded at a specific location once every 100 years on average. A magnitude of rare events refers to the physical magnitude associated with the events, e.g. the wind speed in km/hr. In statistics, techniques are used for estimating an extreme quantile of a distribution, given a random sample of data drawn from the distribution. The extreme quantile refers to a value within a range of possible values taken by the distribution, such that either a probability that any value larger than the value will be observed or a probability that any value smaller than the value will be observed is extremely small (e.g., less than 0.001). A distribution specifies the relative frequency of occurrence of events of different magnitudes. An example of estimating an extreme quantile of a distribution is estimating a wind speed that will be exceeded at a specific location once every 100 years.
A traditional approach for estimating sizes of events that will exceed an extremely small probability (e.g., a probability less than 0.001) is to assume a particular form for the distribution, to fit the distribution to different subsets of a largest observed data values, and to choose one of these subsets which provides a “best” estimate of the extreme quantile. An observed value refers to a datum in a subset or an entire set. This traditional approach of choosing a subset providing the best estimates of extreme quantile, or equivalently choosing an appropriate threshold such that estimates of extreme quantiles are based only on data that exceed the threshold, is called “adaptive thresholding”.
A criterion used to judge a quality of estimates typically evaluates whether a distribution fitted to data in a subset is correctly specified, i.e. whether a mathematical form of a distribution specifies a distribution from which the data are assumed to be drawn. For example, the criterion may be an estimated accuracy of the estimated extreme quantile under an assumption that the fitted distribution is correctly specified or a measure of the fluctuations of an estimated tail index over different subsets. A tail of a distribution refers to each end (upper end or lower end) of the distribution. A tail index is a number that specifies a mathematical form of the upper end of the distribution. An estimated tail index is an estimate, computed from a set of data, of the tail index of the distribution from which the data were drawn.
However, this traditional approach is not always accurate because a distribution from which the data are drawn may be different from a distribution assumed in a fitting process (e.g, a processing fitting a distribution to different subsets of the largest observed data values). This inaccuracy of the traditional approach can produce unreliable estimates of quantiles.
Thus, it is highly desirable to have a method and a system for choosing an optimal subset of data which increases a robustness of the fitting process and achieves consistency with smaller subsets (i.e. has the same characteristics as smaller subsets). Such method and system should further compute a quantile estimate based on the optimal subset.