Statistical information can be culled from large data sets. However, it can be impracticable or impossible to store the large data set, and similarly impracticable or impossible to perform a statistical analysis on the large data set.
Conventional methods and systems sample data in a large data set in order to analyze the large data set. However, such conventional methods and systems do not compensate for a common problem of skewed traffic distribution in which address associated with a small number of particular addresses accounts for a large portion of the total traffic associated with a large number of addresses. Without accounting for the skewing problem, sampling traffic from the addresses that are associated with disproportionately large amounts of traffic can introduce errors, in addition to being inefficient and overly time consuming.
Such conventional methods and systems have generally been considered satisfactory for their intended purpose. However, there is still a need in the art to compensate for skewed network traffic in an efficient manner that preserves accuracy. The present disclosure provides a solution for these problems.