Many geospatial datasets are stored in the form of data grids that are geo-referenced to the surface of the Earth. For example, nearly all weather forecasts, along with historical and current condition weather analyses, are stored in a gridded data format, and weather information for performing forecasts and analyses is traditionally created, consumed, and stored as two-dimensional grids of values (or stacks of grids of values). Each grid cell that comprises a grid covers a region (e.g. a 5 km by 5 km square), and each grid cell has a single value.
Each grid covers some portion of the Earth, such as for example North America, Australia, the continental United States, the Great Lakes region (for example, for lake effect snow events), or even the entire globe. Different grids have different cell sizes, often based on the coverage of the grid. As an example, one grid may store the forecasted air temperature values across the continental US for 8 pm this evening. The 9 pm air temperature forecast is stored in a different grid, and the wind speed, wind direction, precipitation type, precipitation rate, and other parameters would also each be stored in a separate grid, again one for each time period. The number of grids across which data for a single location is spread can quickly grow into the thousands.
This manner of storage of gridded data in the existing art is efficient for e.g. weather models when generating and storing information. However, this is not efficient for facilitating quick access for a single or small number of locations for subsequent application of the data or other use. In other words, database storage paradigms in the existing art work well when providing input to the database, but they do not work well when one wants to extract data for subsequent use.
As an example of problems experienced with this existing approach to storing weather data, consider the effort involved to produce a detailed hour-by-hour forecast over a 48-hour period for a specific location. For each hour, data must be extracted for several different parameters, such as air temperature, dew point, wind speed, wind direction, wind gusts, visibility, type of visibility obstructions, cloud cover, and total probability of precipitation. Further for precipitation, one has to extract both probability and rate for the different precipitation types (i.e. rain, snow, slush, and ice). Data may be needed for 20 or more different parameters for each of the 48 hours, resulting in a thousand or more separate values.
Utilizing grids as in the existing art to build this forecast first requires that the location, represented by latitude and longitude, be transformed to the grid cell index that covers that location. As an example, for Grand Forks, N. Dak. the latitude is 47.924663 and the longitude is −97.033997. In a 10-km grid that covers the continental United States, this may map to the grid cell at row 233, column 261.
Building this forecast with a gridded data set requires that each of the thousand or so different grids (which are often stored as separate files) must be opened and that from each grid a single value at the 233,261 position must be extracted for the forecast. This is an expensive and time-consuming process. Grids are often stored in a compressed format which requires that a file first be opened, read, and un-compressed before being able to extract the single value needed. Even if the grids are stored uncompressed, for each grid a file must first be opened and then searched for the value needed, and then closed.
Scaling such an approach represents a serious challenge to efficiently serving up thousands of forecasts a second, as it simply requires too much wasted processing resources. Therefore, there is a need not found in the existing art for a quick, efficient, and cost-effective database protocol and method for storing, arranging, and extracting gridded data for different uses, such as in weather data provision.