Introduction
The gradual accumulation of spatial data in all fields of science as well as the increasing capacity of data acquisition equipment, such as LIDAR (Light Distancing And Ranging) for airborne topographic mapping, is giving rise to an exponentially growing volume of information. This growth exceeds Moore's law doubling computation capacity every 18 months. The common approach, when carrying out analyses on computing platforms, is to throw more computational resources at the problem. This approach is reaching its scalability limit and, once again, the development of smart and efficient algorithms is becoming paramount for practical processing gargantuan data sets.
Spatial analysis based on regular rectangular grids is one method of improving the efficiency of data processing and modeling. The regular grid affords innumerable advantages in statistical analysis, frequency domain analysis, linear or non-linear modeling and so on. In addition, the regularity of rectangular grids allows the use of hardware assisted vector processing techniques which further leverage Moore's law.
Problem Statement
Most collected data is sampled in irregular or semi regular patterns; therefore an important step in data processing is the marshalling of data into a uniform spatial grid. This is done by generating a grid G of M by N cells from P distributed sampling points characterized by their value Zp, and their coordinates Xp and Yp. The prototypical gridding method uses the Inverse Distance Weighing (IDW) method to interpolate a value at each grid point involving the following computation for each grid cell Gn,m.
      G          n      ,      m        =            ∑              p        =        1            P        ⁢                            V          p                /                                                            (                                                      X                    p                                    -                                      m                    ⁢                                                                                  ⁢                    Δ                    ⁢                                                                                  ⁢                    x                                                  )                            2                        +                                          (                                                      Y                    p                                    -                                      n                    ⁢                                                                                  ⁢                    Δ                    ⁢                                                                                  ⁢                    y                                                  )                            2                                          /                        ∑                      p            =            1                    P                ⁢                  1          /                                                                      (                                                            X                      p                                        -                                          m                      ⁢                                                                                          ⁢                      Δ                      ⁢                                                                                          ⁢                      x                                                        )                                2                            +                                                (                                                            Y                      p                                        -                                          n                      ⁢                                                                                          ⁢                      Δ                      ⁢                                                                                          ⁢                      y                                                        )                                2                                                        
This requires P compound operations for each of M×N cells for a total of M×N×P compound operations. Other techniques, such as krigging, triangulated irregular network (TIN) and spline interpolation, also involve the sequential revisiting of each of the P sampling points for each of the M×N grid points.
If the grid is to approximate the spatial resolution of the original data set one would expect the number of grid points M×N to be proportionate to the number of sampling points P. This yields that processing power needed from gridding a data set grows with P2. Clearly such brute force gridding techniques become untenable with very large data sets.
Two factors affect the scalability of large data set analysis: processing power (i.e. number of operations required) and memory usage (i.e. number and size of storage units). Often one of these features is normally compromised to optimize the other. A modest geographic data set by today's standard might be a digital terrain model of an urban area having point samples at an average resolution of In over an area of 10 km×10 km (100 km2). This is 100,000,000 data points with an X, Y and Z value which if represented in double precision translate to 2.3 Gb of information.
The popular LiDAR (Light Distancing and Ranging) technology now commonly delivers as many as 1 to 2 billion X, Y, Z coordinates in a single surveyed region. This represents 20 to 40 Gb of raw information. This will stress the memory capacity of many current computer systems especially if more fields are needed for the storage of intermediate results.
With processing requirements of the order of O(P2) and assuming only one operation per point, current hardware requires hundreds of hours of calculations. This clearly places interactive processing out of reach and reduces the feasibility of exploratory analysis, that is, performing what if scenarios to aid in decision making.