1. Field of the Invention
The present invention relates to a distribution goodness-of-fit test device for judging based on approximation to a chi-square distribution whether measured data follows a discrete distribution such as a binomial distribution or a Poisson distribution
2. Description of the Related Art
Conventionally, as to whether or not data obtained by measurement follows a discrete distribution such as a binomial distribution or a Poisson distribution, there is used a chi-square goodness-of-fit test for testing the goodness of fit by approximation to a chi-square distribution. In this test method, as to whether or not the transaction of persons, cars, packets or the like arriving at constant intervals follows a multinomial distribution, the goodness of fit is tested by approximation to the chi-square distribution.
A method of the goodness-of-fit test to the Poisson distribution by using the chi-square distribution will be described by use of a specific example. Table 1 shows the result of checking the arrival number of calls received by a certain telephone line on a weekday for 30 days. Table 2 is a table obtained by classifying this according to the arrival number.
TABLE 1day123456789101112131415x633535545671855day161718192021222324252627282930x5108354783633392
TABLE 2arrivalobservedfrequency ×numberfrequencyarrival numberexpected frequency0000.202138411111.010692052122.52673012538244.2112168744285.26402109358405.26402109363184.38668424472143.13334588983241.958341189191.08796732210 1100.54398366111 000.24726530112 000.103027209total30150
In a conventional goodness-of-fit test of a Poisson distribution by using a chi-square distribution, first, an arrival rate is estimated as Σ (arrival number×observed frequency)/(total observed frequency). Table 2 shows values of multiplication of arrival numbers and observed frequencies. Since the multiplication values added together make 150, it is divided by 30 of the total observed frequency, and the arrival rate λ is estimated as 5. FIG. 1 shows a graph in which a Poisson distribution with the arrival rate λ=5 and observed values actually observed are overlapped with each other and are indicated. Incidentally, the arrival rate is the average value of the arrival number of arrivals per unit time, and corresponds to the average value of the Poisson distribution.
Next, an expected frequency is obtained from the estimated arrival rate. When the arrival rate is estimated, the distribution shape of the Poisson distribution is determined from this arrival rate. The expected frequency is a value of an observed frequency on the estimated Poisson distribution. For example, in the observed data shown in Table 1, although the time when the arrival number is 4 is observed twice, the expected frequency of the arrival number 4 on the Poisson distribution is 5.264021093. The expected frequency can be obtained by a following mathematical expression 1. Table 2 shows the expected frequencies at the respective arrival numbers.
                              total          ⁢                                          ⁢          observed          ⁢                                          ⁢          frequency          ×                      (                                          1                                  ⅇ                  λ                                            ·                                                λ                  x                                                  x                  !                                                      )                          =                  30          ⁢                      (                                          1                                  ⅇ                  5                                            ·                                                5                  x                                                  5                  !                                                      )                                              (        1        )            
Next, a test statistic is obtained by using a mathematical expression (2) set forth below. Where, Xi denotes an observed frequency in a counting section i of the arrival number, and Ei denotes an expected frequency in the counting section i of the arrival number. The counting sections in which the expected frequency is 1 or more are not changed, and the counting sections in which the expected frequency is less than 1 are combined into one, and the total number of the counting sections is made m.
                              χ          2          0                =                              ∑                          i              =              1                        m                    ⁢                                          ⁢                                                    (                                                      x                    i                                    -                                      E                    i                                                  )                            2                                      E              i                                                          (        2        )            
This test statistic is compared with a chi-square value χ2(m−2, α) with a degree of freedom of m−2, and when the test statistic is larger than the chi-square value χ2(m−2, α), it is judged that “the observed data does not follow the estimated Poisson distribution”, and when the test statistic is smaller than the chi-square value χ2(m−2, α), it is judged that “it can not be said that the observed data does not follow the estimated Poisson distribution”. α represents a significant level.
In the foregoing example, the test statistic χ02=9.208023<χ2(8, 0.05)=15.50731249, and the test result is such that “it can not be said that the observed data does not follow the Poisson distribution”.
Patent document 1 discloses a method of testing the goodness of fit of a software reliability growth curve by using the property of bugs of computer programs as a sequence statistic and censoring data.
Patent document 2 proposes a method of judging whether or not a difference in observation condition has an influence on the way of occurrence of an event by comparing two data obtained by observation performed the same number of times while the observation condition is changed.
[Patent document 1] Japanese Patent No. 2693435
[Patent document 2] JP-A-2003-281116
Unless the expected frequency is 10 or more, the test statistic for performing the approximation to the chi-square distribution becomes insufficient in accuracy. Thus, it is necessary that the number of data is made the number of classifications×10 or more. However, in the case where the number of classifications is large, there arises a problem that it takes time and cost to collect data. Further, since the observed data is not uniformly obtained in the respective classification sections, there is also a classification section in which even if the number of data is increased, the expected frequency does not become large. In this case, it becomes necessary to perform a processing of combining the classification in which the expected frequency is small, and there arises a problem that it takes time to perform the processing.
Besides, as is apparent from the diagram of trains or buses, a normal arrival rate (an average value of the arrival number of arrivals per unit time) is not constant, but follows a inhomogeneous Poisson process in which the arrival rate varies. However, in the inhomogeneous Poisson process, the distribution shape itself is changed in accordance with the change of the arrival rate, it has been impossible to apply the chi-square goodness-of-fit test as it is.
Patent documents 1 and 2 do not disclose techniques to solve the problems as stated above.