Not Applicable.
1. Field of the Invention
The present invention relates to a statistical analysis of data, specifically to a technology for estimating a measure of randomness of a function of at least one random variable.
2. Discussion of the Related Art
Frequently, there are performance measures of systems, which are based on the means of random variables. For example, a percentage of time of machine under repair is a function of the mean repair time divided by the mean time between the beginning of repairs. It is important to distinguish between the mean of a function of at least one random variable and the function of the means of at least one random variable. In the case of the machine repair, it would be possible to divide the individual repair times by the individual times between the beginning of repairs, and to obtain the mean of this ratio. However, this mean of the function would differ from the function of the means. Only the function of the means represents the correct percentage of the machine under repair.
Frequently, the means of the random variables are not known exactly, but rather are based on a set of collected data. Therefore, these means may differ from the true means. Subsequently, the function of the means may differ from the function of the true means. Frequently, there is interest in a measurement of the accuracy of the function of the means. This measurement of accuracy is usually expressed as a confidence interval around the mean or median, but may also be expressed as a variance, a standard deviation, or a quantile. While the calculation of such measures is well known in statistical analysis for individual random variables, it is more difficult for functions of the means.
Common uses of the function of at least one mean are frequencies of occurrences, where the mean frequency is the inverse of the mean time between occurrences. Another common uses are percentages of times, where the mean percentage is the mean duration divided by the mean time between the start of duration""s cycles.
One conventional method to calculate the confidence interval of the function of means is called batching, also known as non-overlapping batch means method. In this method, the sufficiently large sets of data are split into a number of subsets. The means for each subset is calculated and subsequently the function of the means is calculated for each subset. A confidence interval can be constructed on the different values of the function of means.
However, this conventional method is suitable only for sufficiently large sets of data in order to satisfy the central limit theorem. This method can therefore not be used on small data sets. In addition, the confidence interval for a set of data can vary significantly with the number of subsets used. The selection of an unsuitable number of subsets may cause incorrect results. Furthermore, this method requires significant storage capacity and computational power as the size of the data set increases. Finally, due to the nature of the computation, these intensive calculations have to be repeated every time additional data becomes available.
Many approaches have been developed to assist the selection of the number of subsets for the above batching method. However, they are usually very complicated and require a high level of expertise. In addition, the results of these approaches may differ from one another. Furthermore, the computational requirements increased ever further as these approaches frequently require a significant statistical effort to analyze the subsets and the relation therebetween.
A variant of the above conventional batching method, known as overlapping batch means method, creates overlapping subsets. While this variant may have a slight improvement over the basic batching method, it still requires a large data set, the selection of a number of subsets, significant storage and computational capacity. Furthermore, the complexity of the variant is still significant and requires significant statistical knowledge.
It is therefore an object of the present invention to permit the estimation of a measure of randomness of a function of at least one representative value of at least one random variable, even for a relatively small size of data set to be used, in a reduced time.
The object may be achieved according to any one of the following modes of this invention. Each of these modes of the invention is numbered like the appended claims, and depends from the other mode or modes, where appropriate. This type of explanation about the present invention is for better understanding of some instances of a plurality of technical features and a plurality of combinations thereof disclosed in this specification, and does not mean that the plurality of technical features and the plurality of combinations in this specification are interpreted to encompass only the following modes of this invention:
(1) A method of estimating a measure of randomness of a function of at least one representative value of at least one random variable, comprising:
a step of obtaining the at least one random variable;
a step of determining the at least one representative value of the obtained at least one random variable;
a step of determining a statistic of the obtained at least one random variable;
a step of determining a gradient of the function with respect to the determined at least one representative value; and
a step of transforming the obtained statistic of the at least one random variable into a statistic of the function, using the determined gradient.
As the result of the inventor""s research, he has found that there exists a statistical characteristic that, while a statistic of a function of a random variable, which statistic may include a measure of randomness or dispersion, strongly tends to reflect a statistic of the random variable, which statistic may include the measure of randomness or dispersion, such that the statistic of the random variable is enlarged in the case of a steep gradient of the function of the random variable, the statistic of the function strongly tends to reflect a statistic of the random variable such that the statistic of the random variable is reduced in the case of a gentle gradient of the function of the random variable
In addition, the above research also revealed that, the utilization of the characteristic mentioned above would permit the estimation of a measure of randomness of a function of a representative value of a random variable, ensuring an accuracy thereof almost equal to one established in the use of the conventional batching method aforementioned, with a smaller size of data used than in the batching method, in a shorter time required than in the batching method.
On the basis of the above findings, in the above mode (1) of the present invention, at least one representative value of at least one random variable is determined and a statistic of the at least one random variable is determined. Furthermore, in the mode (1), a gradient of a function of the at least one random variable with respect to the determined at least one representative value is determined, and, by the use of the determined gradient, the determined statistic of the at least one random variable is transformed into a statistic of the function.
Hence, the mode (1) would permit the estimation of a measure of randomness of a function of at least one representative value, by the use of a smaller size of data used than in the conventional batching method, in a shorter time required than in the batching method.
The term xe2x80x9crepresentative valuexe2x80x9d may be defined, in the above mode (1) and other modes of the present invention, to mean a measure of central tendency of a distribution of a plurality of individual data values belonging to the at least one random variable or the function, for instance.
Further, in the case of a plurality of random variables or a plurality of functions, the term xe2x80x9crepresentative valuexe2x80x9d may be defined, in the above mode (1) and other modes of the present invention, to mean a plurality of representative values for the plurality of random variables or functions, respectively, for instance.
In addition, the step of determining a gradient may be constituted to exactly or approximately determine the gradient. For example, the step of determining a gradient may be adapted to determine a gradient of the function exactly at the at least one representative value, and may be adapted to determine a gradient of the function in the vicinity of the at least one representative value.
Furthermore, the term xe2x80x9cfunctionxe2x80x9d is interpreted, in the above mode (1) and other modes of the present invention, as an operator for associating the at least one random variable with at least one other variable, one example of which may be a performance function described below, which function associates the at least one random variable with a performance measure.
(2) The method according to the above mode (1), wherein the step of transforming comprises transforming the statistic of the at least one random variable into the statistic of the function, such that the statistic of the function responds to the statistic of the at least one random variable more sensitively in the case of the gradient being steep than in the case of the gradient being gentle.
In the above mode (2), in light of the statistical characteristic aforementioned, which has been recognized by the inventor, a statistic of at least one random variable is transformed into a statistic of a function, such that the statistic of the function responds to the statistic of the at least one random variable more sensitively in the case of the gradient being steep than in the case of the gradient being gentle. (3) The method according to the above mode (1) or (2), wherein each one of the at least one representative value of the at least one random variable comprises at least one of an average, an arithmetic mean, a geometric mean, a median, a harmonic mean, and a mode, of each one of the at least one random variable.
(4) The method according to any one of the above modes (1) to (3), wherein the step of determining the at least one representative value comprises determining the at least one representative value of the at least one random variable, upon truncating at least one part of individual data values belonging to the at least one random variable.
In the above mode (4), the at least one representative value is determined with the removal of abnormal data out of the plurality of individual data values by the application of truncation to the original individual data values, resulting in the improvement in an accuracy of determining the at least one representative value, followed by the improvement in an accuracy of estimating the randomness of the function of the at least one random variable.
(5) The method according to any one of the above modes (1) to (4), wherein the statistic of each one of the at least one random variable comprises at least one of a standard deviation, a confidence interval, a set of data, a probability density function, and a cumulative density function, of the each random variable.
(6) The method according to any one of the above modes (1) to (5), wherein the statistic of the function comprises at least one of a standard deviation, a confidence interval, a set of data, a probability density function, and a cumulative density function, of the function.
(7) The method according to any one of the above modes (1) to (6), further comprising a step of estimating the measure of randomness of the function of the at least one representative value, on the basis of the statistic of the function.
(8) The method according to the above mode (7), wherein the measure of randomness comprises a range of a confidence interval of the function of the at least one representative value.
(9) The method according to the above mode (7) or (8), applied to a simulation for discrete event, results of which simulation is used to effect the method, wherein the step of estimating comprises estimating the measure of randomness using results of only one execution of the simulation.
The above mode (9) would permit the estimation of the randomness of the function of the at least one random variable in a shorter time than estimated by the conventional batching method aforementioned.
Furthermore, this mode (9) would allow the reduction in time length required for the randomness estimation described above for one simulation, and as a result, this mode (9) would facilitate to perform the randomness estimation for other simulation within a given time.
Consequently, in the case where a plurality of simulations for a system to be investigated on its performance is required for the above randomness estimation, this mode (9) would permit the randomness estimation for those plurality of simulations in a shorter time than the conventional batching method mentioned before.
Thus, this mode (9) would also facilitate to compare the estimated measures of randomness for those plurality of simulations within a reduced time, facilitating an optimization of the system to be investigated using simulations, within a shorter time, at an improved accuracy.
(10) The method according to the above mode (9), wherein an accuracy to be satisfied with the statistic of the function is predetermined, and the step of determining a statistic comprises:
(a) determining the statistic of the at least one random variable, on the basis of a sum of individual data values belonging to the at least one random variable;
(b) determining the statistic of the at least one random variable on the basis of the sum, upon adding to the sum at least one new individual data value belonging to the at least one random variable;
(c) determining the statistic of the at least one random variable when at least one new individual data value belonging to the at least one random variable becomes available during the simulation;
(d) transforming the determined statistic of the at least one random variable into the statistic of the function; and
(e) terminating the simulation when the predetermined accuracy is satisfied with the statistic of the function.
The above mode (10) would facilitate to monitor the increase in an accuracy of the statistic of the function of the at least one random variable as the simulation progresses.
In addition, this mode (10) would facilitate to automatically terminate the simulation when the predetermined accuracy of the statistic of the function of the at least one random variable is reached.
(11) The method according to any one of the above modes (1) to (10), wherein the function is a function of a plurality of random variables, the step of transforming comprising:
(a) determining a measure of randomness of each one of the random variables at or in the vicinity of a representative value of each one of the obtained plurality of random variables, as the statistic of each random variable;
(b) determining a measure of dependence between the plurality of random variables; and
(c) transforming the determined measures of randomness of the plurality of random variables into a measure of randomness of the function, using the determined measure of dependence and the determined gradient.
In the above mode (11), in the case of a plurality of random variables, the randomness of the function of the plurality of random variables is estimated by taking account of a measure of dependence between those random variables.
Subsequently, this mode (11) would allow, in the case of a plurality of random variables, the accurate estimation of the randomness of the function of those random variables.
(12) The method according to the above mode (11), wherein the measure of randomness of the each random variable comprises at least one of a maximum likelihood estimator of a variance of the each random variable, an unbiased estimator of the variance, a maximum likelihood estimator of a standard deviation of the each random variable, an unbiased estimator of the standard deviation, a variance of a representative value of the each random variable, a standard deviation of a representative value of the each random variable, a coefficient of variation of the each random variable, a general central moment of the each random variable, a confidence interval of the each random variable, a set of data indicative of the each random variable, a probability density function of the each random variable, and a cumulative density function of the each random variable.
(13) The method according to the above mode (11) or (12), wherein the measure of dependence comprises at least one of an unbiased estimator of a covariance of the plurality of random variables, a maximum likelihood estimator of the covariance, and a correlation coefficient of the plurality of random variables.
(14) The method according to any one of the above modes (1) to (13), wherein the function is a function of a plurality of random variables, the step of transforming comprises transforming the obtained statistic of the plurality of random variables into the statistic of the function, without a calculation of a measure of dependence between the plurality of random variables.
In the above mode (14), in the case of a plurality of random variables, a statistic obtained for those random variables is transformed into a statistic of the function, without a calculation of the dependence between those random variables.
Thus, this mode (14) would permit, in the case where the number of the at least one random variable is plural, and where the plurality of random variables are independent of each other or are dependent from each other at a negligible low level, the estimation of the randomness of the function of the random variables in a shorter time than when, upon the calculation of dependence between those random variables, the transformation between statistics is performed.
(15) A method of determining a set of data of a function of a representative value of each one of at least one random variable, which set of data allows an evaluation of a statistic of the function, comprising:
a step of obtaining a set of individual data values belonging to each random variable, which set represents an approximation of a distribution of the each random variable;
a step of determining the representative value of the each random variable;
a step of determining a gradient of the function with respect to the determined representative value; and
a step of transforming the obtained set of individual data values into the set of data representing the function.
In the above mode (15), in light of the findings recognized by the inventor of the present invention, as described with relation to the above mode (1), a set of a plurality of individual data values belonging to the each random variable, which set represents an approximation of a distribution of the each random variable is obtained, and a representative value of the each random variable is determined. Furthermore, in this mode (15), a gradient of the function of the at least one random variable with respect to the determined representative value is determined, and by the use of the determined gradient, the obtained set of individual data values for the at least one random variable is transformed into a set of data representing the values of the function.
Consequently, this mode (15) would permit the estimation of a measure of randomness of a function of at least one random variable in the form of a set of data representing the randomness, according to basically the same principle as the one accepted in the above mode (1).
(16) The method according to the above mode (15), wherein the step of transforming the set of individual data values of the each random variable into the set of data representing the function, such that the set of data representing the function responds to the set of individual data values more sensitively in the case of the gradient being steep than in the case of the gradient being gentle.
(17) The method according to the above mode (15) or (16), further comprising a step of estimating a measure of randomness of the function of the representative value, on the basis of the set of data representing the function.
(18) The method according to the above mode (17), wherein the measure of randomness comprises a range of a confidence interval of the function of the representative value.
(19) The method according to the above mode (17) or (18), applied to a simulation for discrete event, results of which simulation is used to effect the method, wherein the step of estimating comprises estimating the measure of randomness using results of only one execution of the simulation.
The above mode (19) would provide basically the same operation and advantageous effects as the above mode (9) would.
(20) The method according to any one of the above modes (1) to (19), applied to an analysis of a plurality of business models to be accepted in realizing a given business, wherein a function of at least one of random variable is predetermined for each one of the plurality of business models, and the function of a representative value of the each random variable for one of the plurality of business models is to be compared with the function of a representative value of the each random variable for another of the plurality of business models.
The above mode (20) would allow the determination of an accuracy of the function of the at least one random variable, for each business model.
In addition, this mode (20) would permit the determination of the likelihood of one business model outperforming another business model.
(21) A method of estimating a measure of randomness of at least one random variable to satisfy a predetermined condition regarding a measure of randomness of a function of at least one representative value of the at least one random variable, the predetermined condition being formulated to define a central location and a measure of dispersion, of a distribution of the function, comprising:
a step of determining a gradient of the function with respect to the defined central location; and
a step of determining the measure of randomness of the at least one random variable, on the basis of the determined gradient and the defined measure of dispersion.
As is apparent from the previous explanation regarding the above mode (1), it is possible to mutually associate a measure of randomness of at least one random variable, and a measure of randomness of a function of at least one representative value of the at least one random variable. This means that, the use of a gradient of the function would permit not only a forward estimation to estimate a measure of randomness of the function of the at least one representative value of the at least one random variable, from a measure of randomness of the at least one random variable, but also a backward estimation to estimate a measure of randomness of the at least one random variable, from a measure of randomness of the function of the at least one representative value of the at least one random variable.
In view of the above findings, in the above mode (21), a condition to be satisfied by a measure of randomness of a function of at least one representative value of at least one random variable is predetermined, where the predetermined condition defines a central location of a distribution of the function, and a measure of dispersion of the distribution. Furthermore, in this mode (21), a gradient of the function with respect to the defined central location is determined, on the basis of the determined gradient and the defined measure of dispersion, and a measure of randomness of the at least one random variable.
(22) The method according to the above mode (21), wherein the step of determining the measure comprises transforming the defined measure of dispersion into the measure of randomness of the at least one random variable, such that the measure of randomness of the at least one random variable responds to the defined measure of dispersion more sensitively in the case of the gradient being steep than in the case of the gradient being gentle.
In the above mode (22), by the use of a gradient of the function, according to a principle accompanied with necessary changes to one accepted in the above mode (2), the defined measure of dispersion of the function is transformed into a measure of randomness of the at least one random variable.
(23) The method according to the above mode (21) or (22), wherein the measure of dispersion comprises at least one of a standard deviation, a confidence interval, a set of data, a probability density function, and a cumulative density function, of the function.
(24) The method according to any one of the above modes (21) to (23), wherein the measure of randomness of each one of the at least one random variable comprises at least one of a standard deviation, a confidence interval, a set of data, a probability density function, and a cumulative density function, of the each random variable.
(25) A computer program to be executed by a computer to effect the method according to any one of the above modes (1) to (24).
When a computer program according to the above mode (25) is executed by a computer, the same advantageous effects would be provided, according to basically the same principle as one accepted in a method set forth in any one of the above modes (1) to (24).
The term xe2x80x9cprogramxe2x80x9d may be interpreted to include, not only a set of instructions to be executed by a computer so that the program may function, but also any files and data to be processed by the computer according to the set of instructions.
(26) A computer-readable storage medium having stored therein the computer program according to the above mode (25).
When the program having been stored in a computer-readable storage medium is executed by a computer, the same advantageous effects would be provided, according to basically the same principle as one accepted in a method set forth in any one of the above modes (1) to (24).
The term xe2x80x9cstorage mediumxe2x80x9d may be realized in different types, including a magnetic recording medium such as a floppy-disc, an optical recording medium such as a CD and a CD-ROM, an optical-magnetic recording medium such as an MO, an unremovable storage such as a ROM, for example.