The present invention relates to a method and system for counting the number of households within a particular region, and in particular relates to such a method and system that accounts for seasonal households and provides accurate household counts in high-growth areas.
A precise knowledge of the number of households that are located within a particular geographic region is desirable for a number of different applications. One application of such knowledge is in site analysis. The purpose for a site analysis is to determine whether a geographic area is a desirable location for a retail business. Such site analysis is commonly approached in two different ways. In a first approach, a particular candidate site is chosen and an analysis is performed with respect to the geographic area that a retail business located at that site would serve. In a second approach, a larger geographic area is analyzed to determine which sub-areas within that area would be the best candidates for a new retail location. In either approach, a precise determination of the number of households (or population) in the region of interest is desirable, since that will determine the potential customer base for a retail business located at the candidate site.
It is often the case that the most desirable sites for a new retail business will be those located where the population is growing most rapidly. Identification and accurate household counts with respect to these rapidly-growing areas is thus highly valuable in any sort of site analysis. Such areas are, however, the most difficult areas in which to perform an accurate household count. Due to the lag between the establishment of a new household in an area and the reporting of that household through various data sources, most of the data sources used to provide household information become less accurate as the rate of increase in population in an area trends upwards.
There are several site analysis products currently offered in the marketplace. A leading provider of these products is Claritas Inc. of San Diego, Calif. The services provided by such companies as Claritas include a means of estimating households (or population) within a defined geographic region based principally upon U.S. Census Bureau data. Once every decade, the U.S. Census collects a broad range of statistical data about the U.S. population, including the counting of households within small geographic units of area across the United States. Since U.S. Census data is only released every ten years, the accuracy of any household count based on census data degrades as the time from when the data was collected increases. A household estimate provided shortly after the data is released is likely to be highly accurate, while a household estimate provided several years later is likely to be significantly less accurate, even if the algorithm and data sources used to provide the estimate remain the same. This problem is most pronounced in those geographic areas that are undergoing the highest growth, particularly where this growth is increasing exponentially rather than linearly from past census data. For example, an estimate of the 2007 population of the Las Vegas, Nev. metropolitan area based on a linear projection of 1990 and 2000 U.S. Census data will significantly underreport the number of households that are actually found in that area. Since it is these high-growth areas that are most likely to be of interest to those companies looking for new retail sites for their businesses, this problem with the timeliness of U.S. Census Bureau data significantly diminishes the value of these existing site analysis services. What is desired then is a means of counting the number of households in a defined geographic area that is accurate even in areas that are experiencing high growth, and even at times when several years have passed since the last U.S. Census data was collected.
One data provider, MapInfo Corporation of Troy, N.Y., has developed a population projection product that, while still employing U.S. Census data for its current household counts, does use additional data for future population projections based on modeling techniques. Such data includes consumer marketing lists and the U.S. Postal Service delivery statistics file (often referred to simply as the “del stat” file). The del stat file includes the number of post office boxes and business/residential deliveries on city, rural, and highway contract routes for every ZIP Code in the United States. While the MapInfo product improves the accuracy of its population projections using the del stat file, it still relies on U.S. Census data for its current household counts, and thus suffers from the limitations inherent in the use of this data, including an inability to provide accurate counts in high-growth areas when the census data has become stale.
Another limitation on existing methods of providing site analysis is that they cannot accurately count the number of households in a geographic area that are “seasonal,” that is, that are vacation homes or are otherwise not the primary residence of the homeowner. An accurate count of seasonal households within an area may be of great value to certain retailers, such as, for example, hardware and home improvement stores. The presence of a large number of seasonal households in an area may indicate a large potential market, even where the permanent population might not indicate the presence of such a market. U.S. Census Bureau decennial data includes a count of “vacant seasonal units” (VSUs), which are those buildings that appear to the census takers to be seasonal households since they were not occupied at the time that the census data in the area was collected. Because of the manner in which this data is collected, it is believed to be less accurate than many other types of census data. In addition, and like other census data, the accuracy of this data degrades with time. As a result, the VSU count from census data for a particular area experiencing high growth in seasonal units may be highly inaccurate within several years after the census data was collected. On the other hand, U.S. Census Bureau data is the only known direct source of data identifying seasonal households. Given the limitations of this data source, it would be desirable to provide a more accurate means of counting the number of seasonal households in an area, particularly in an area experiencing high growth, and even more particularly in an area that is experiencing high seasonal household growth.
With respect to any attempt to generate household-level population counts, still another problem is the geographical location of households that are known to exist but for which a precise geographic location is not known. One prior art method, known as “area density,” distributes population based simply on a proportional measurement of area. For example, suppose that a geographic region may be divided into ten sub-regions. Further suppose that there are one hundred households that are known to lie somewhere in the region, but it cannot be determined directly from available data in which sub-region the households may be found. The area density method would distribute the households across the sub-regions based on the ratio of each sub-region's area to that of the overall geographic region. The largest sub-regions thus will be assigned the greatest number of households, with smaller sub-regions receiving fewer households, down to the smallest sub-region which will receive the fewest. A slightly improved technique known as “block density” distribution is sometimes used where a number of households are known to exist somewhere in a U.S. Census block group, but the precise block where those households are located is not known. The block density approach simply distributes the households across the blocks that comprise the block group based on a pro-rata apportionment. The problem with both of these approaches is that they do not take into account the fact that new household construction tends to be concentrated in relatively small geographic regions that are seen to be highly desirable. In the area density example above, if a large new housing subdivision accounted for most or all of the additional households, it would likely lie in only one or two of the geographic sub-regions. The area density approach, however, would spread those households into all of the other regions as well. The result would be a significant undercount of households in the sub-regions where the subdivision was actually constructed. Likewise, this type of algorithm would overcount the number of households in the sub-regions that are not experiencing growth due to the subdivision. The same problem would occur if block density distribution were used, since the growth is likely concentrated in one or two blocks, but the distribution scheme would spread the households across all of the blocks. This is a particularly important problem since, again, high-growth areas such as the geographic sub-region where the subdivision was built are precisely the areas that are of greatest interest to many retailers for whom a site analysis is performed. It is thus also desirable to provide a means of distributing households for which an exact geographic location is unknown based on factors more accurate than a simple allocation by area, such as by housing density, in order to more accurately count households in those areas experiencing the highest growth.