When plotting data in a histogram, scatterplot, or other charts, the data used for the chart is placed into bins to more readily interpret the data. For example, to plot employee ages, rather than a bar for each employee's age, the data can be binned into five-year bins to plot a separate bar for ages 20-25, 25-30, etc.
Conventional data binning analysis defines bins such that there are N number of bins, each having a width of W. If the range of data is not known beforehand, the data is scanned first to determine the range. After this initial scan, either the bin width (W) is calculated based on the desired bin count (N) or the bin count (N) is determined based on a desired bin width (W).
However, this conventional analysis requires that the data be read twice: once to obtain the range (minimum and maximum values of the data set) and then again to create the bins. When dealing with big data sets, it is inefficient and costly to perform binning this way. Therefore, there is a strong need for a cost-effective solution that overcomes the above issue. The present invention addresses such a need.