In the past few years computing power and data storage capacity have both grown by orders of magnitude. It is now both possible and relatively inexpensive to store huge quantities of data. However, there is a downside to this progress. The problem now facing the software developer or programmer is how to make this data available to the user in a way that conveys information and at the same time avoids data overload. The first step in this process is to retrieve only the data that the user wants or needs. The second step is to present the retrieved data, usually on a computer screen, in a format that minimizes visual overload and confusion. The methods and embodiments described herein address this second step. In particular, these methods and embodiments address the issue of displaying data with a geographic component, conveying the locations, pattern, density and distribution of data points in a readily understandable and clear manner.
Much of the data being stored today has a geographic component. While the data itself is of value, it is more valuable when associated with a specific location, often expressed in terms of latitude and longitude. In its simplest form, an example of this type of data might be the location of coffee shops within a city. A list of the coffee shops only becomes useful when associated with street addresses. Even more useful is a map of the city showing the locations of these coffee shops. Yet more information can be extracted from this data if the locations of the coffee shops are ranked by proximity to the user. Ultimately, the most useful form of display for the user is a map with the user's location at the center, surrounded by the closest coffee shops.
Even an example as simple as this illustrates the problems facing the software developer. If a user searches for restaurants in Lake City, Colo., fewer than 10 results will be returned. These can easily be displayed on a screen, even on a small screen such as a smartphone. But if the user searches for restaurants in the Denver metropolitan area, around 3,000 results will be returned. If all of these data points are displayed on a small screen, all the user sees is icons, with the underlying map being completely hidden. One solution to this problem is that adopted by Microsoft Streets and Trips in its personal computer version. While it lists the results on the side of the screen, it displays no icons at all until the user has zoomed in to a level where the individual streets are visible. This ensures that there will be minimal overlap between the icons on the screen. This is an elegant solution for the users of a program such as Microsoft Streets and Trips, which is intended to be used to find local points of interest, or plan a trip. It is not as useful for data analysis. For example it would not be possible to determine the distribution of Italian restaurants within a specific state using such a program, because a screen showing the entire state would not show local streets, and thus would not display any icons.
A different approach is taken by Craigslist, the popular user-to-user e-Commerce web site. When the map display option is selected for a particular category of item for sale, there may be many such items within the area visible on the screen. The program handles this by clustering items within limited geographic areas. Each cluster is represented by a circle with a number in it, indicating the number of items in the cluster. The size of the cluster varies according to the number of items, although the size is not proportional to the number of items. As the user zooms in, the clusters split into smaller clusters representing smaller geographic areas. As the user zooms out, the clusters merge into larger clusters representing large geographic areas. A feature of this approach is that even a single item on its own in a remote geographic area will show up on the map, while other clusters may represent thousands of items for sale in a major metropolitan area. This avoids a common problem of remote low density data points vanishing from the display as the user zooms out, at least until the geographic area covered by a cluster becomes large enough that it absorbs the outlying data points. These data points do not disappear entirely, in that they are added to the number of data points within the cluster, but the information about the distribution of these data points is lost at the higher zoom levels.
In some instances, the data points represent different physical objects. The different types of object may be represented by different icons on the screen. There still remains the problem of visual clutter when too many icons are displayed close together. For one attempt to resolve this problem see U.S. Pat. No. 6,405,129 to Yokata entitled “Method of Displaying POI Icons for Navigation Apparatus”. This describes displaying the most common icons first, and then overlaying them with the less common icons. The result is, for example, that a single ATM will not be hidden by a cluster of restaurants. However, this approach is not helpful when all of the data points represent the same type of data, and it still does not resolve the problem of visual clutter.
For another approach that attempts to display fewer data points see U.S. Pat. No. 6,8654,832 to Cook et al., entitled “Method and Apparatus for Providing a Topology View Based on Heuristic Information Density”. This technique assigns a weighted importance to each point of interest. The weighted importance of each point of interest is summed and compared against a predefined target value. If the summed weight substantially equals the target value, the corresponding points of interest are displayed. The obvious drawback to approaches such as this is the need to come up with a technique for assigning the weighted importance to each data point. It is quite possible, depending on what the user is looking for, that the importance of each data point may vary substantially. This adds a great deal of complexity to the process. It is also possible that the weighting process itself may introduce patterns to the data, while masking the real underlying patterns within the data set.
Other approaches have used one icon to represent many data points. For an example of this, see U.S. Pat. No. 7,076,741 to Miyaki entitled “Point-of-interest icon and point-of-interest mark display method”. In this approach, when a search returns a large number of point of interest (POI) icons with high geographic density, one representative icon is displayed instead of the individual POI icons. The summary of this patent even states “a first object of the present invention is to make it easy to see roads on a map by reducing the number of POI icons displayed on the map”, acknowledging the problem that when too many icons are displayed on a map, they may even obscure the underlying map. However, the process of deciding when too many icons are displayed and the complexity of selecting the size and location of a representative icon are fraught with difficulty. Perhaps the Craigslist technique described above comes closest to achieving a useful product using a variation of this approach.
For an approach that integrates the user position into the decision making process, see U.S. Pat. No. 7,272,489 to Tu entitled “Navigation method and system for extracting, sorting and displaying POI information”. This approach is especially well-suited to displaying data on personal navigation systems, GPS devices and cell phones. It sorts the points of interest into multiple levels based on distance from the current user position. As the user zooms in or out, data points from different levels are added to or removed from the screen of the device, depending on the geographic area visible on the screen. However, no attempt is made to limit the number of data points displayed. This method is quite acceptable when dealing with local data geographically close to the user, which limits the number of available data points. It has the drawback that as the user moves, as would be expected when the user is looking at a personal navigation system, the icons have to be constantly assigned to different levels as the distance to each point of interest changes. Further, it does not work well for large geographic areas containing large numbers of data points.
For a method that discriminates between data points using the relevance of a data point to the user's query, see U.S. Pat. No. 8,037,166 to Seefeld, et al., entitled “System and Method of Displaying Search Results Based on Density”. The problems inherent in this approach include the complexity of determining the relevance of a data point, and the possibility that the user will not be able to determine the relevance of a data point until the user can actually examine that data point on the map.
With the availability of large and relatively inexpensive computer display screens and large scale color printers, it would seem reasonable to consider the use of color to address the problem of displaying clusters of data points. For example of this approach see U.S. Pat. No. 8,165,808 to Bernard Scheibe, entitled “Techniques for Representing Location Information”. Described therein is a technique for replacing clusters of data points with colors of varying densities and hues. As the user zooms in to a smaller geographic area with a few data points, the display changes to show the actual data points. This approach has several drawbacks. The colors may obscure details in the underlying map data, especially if the underlying map itself makes use of color to denote political boundaries, geologic formations, population densities, etc. The data colors and the map colors may combine to produce unintended and confusing results. When using color to denote values on a display, research has shown that unless the colors are chosen very carefully, the eye may be preferentially drawn to certain areas that are not necessarily particularly significant in terms of the data they represent. Further, the change from the color density display to the display of actual data points is jarring for the user as he or she zooms in and out. There also exists the problem of how to handle a display in which some regions have high density data areas and other regions have a low density of data. Displaying some data in color and some data as icons may be confusing to the user.
For a different clustering approach, see U.S. Pat. No. 8,3393,992 to Bradford Snow entitled “Declustering point of interest icons”. The approach described in this patent uses clustering to display data points when there are a large number of overlapping data points. It uses “superclusters” and “mini-clusters”, producing the effect of clusters within clusters. It attempts to display individual point of interest icons by placing them on the display, and when they overlap, spreading them out with pointers drawn from the individual icons to the center of the cluster. This does reduce the overlap of the points of interest, but it has the disadvantage of displaying icons removed from their true geographic locations. The data represented by many of these icons is neither at the position shown nor at the center of the cluster as suggested by the pointer. It may also add to the visual confusion because of the lines drawn from the icons to the centers of the clusters. While this approach may work for some applications, perhaps with a limited number of icons, it is not appropriate for the display and analysis of large volumes of data.
Several methods have been proposed for associating data points with zoom levels. See, for example, U.S. Pat. No. 8,490,025 to Jakobson, et al., entitled “Displaying content associated with electronic mapping systems”, and U.S. Pat. No. 8,504,945 to Jakobson, et al., entitled “Method and system for associating content with map zoom function”. The main purpose of these methods is to provide data rapidly and efficiently to small handheld devices. Different data sets are associated with different zoom levels at a server, and are then provided quickly to the user's device as he or she zooms in and out. No attempt is made to address the issue of too many icons being displayed on-screen.
Another approach for controlling the number of icons displayed depending on the zoom level is described in U.S. Pat. No. 8,600,619 to Bales et al., entitled “Method and apparatus for providing smart zooming of a geographic representation”. This patent describes how “the custom zooming application determines respective degrees of relevance of the plurality of objects based, at least in part, on the device, a user of the device, related context information, or a combination thereof”. Again this introduces a level of complexity and the need for decision-making as to exactly what data may be considered relevant. This technique may work adequately for small sets of fairly similar data, but may not be as useful for very large sets of disparate data.
Similar approaches are seen in U.S. Pat. No. 8,612,563 to Seefeld, et al., entitled “System and method of displaying search results based on density”, and U.S. Pat. No. 8,713,004 to Hands et al., entitled “Method and system for prioritizing points of interest for display in a map using category score”.
Some of the approaches in which data points are allocated to different layers or levels containing different numbers of data points, and the different levels displayed as the user zooms in and out, are referred to as “regionation”. As has been described above, some of these approaches tend towards complexity and require decisions to be made either by the software developer or by the end-user, sometimes even before the data has been fully analyzed to allow these decisions to be made properly. A better approach would be one that requires little input from the user, beyond specifying the criteria for the data retrieval, and perhaps the maximum number of data points to be displayed on the screen.
Such an approach has been adopted by the open source Geoserver software, which attempts to use what it describes as random techniques to assign data points to different levels. Unfortunately the so-called “random” approach involves nothing more than using the existing order of the data. This is highly unlikely to be random, especially with large data sets which have been bulk loaded, perhaps one geographic area at a time, or data owned by several companies, with data from one company loaded before that of another. Further, the software developer or the end-user may have placed an “order by” clause in the query used to retrieve the data and thus the order will be anything but random.
What is needed is a way to place the data in a cache with data points assigned to different regionation levels, using a truly random approach, while retaining the pattern of data points found in the a retrieved data set. Data from the cache is displayed from an appropriate regionation level as the user pans and zooms. As will be known to one of ordinary skill in the art, achieving a truly random approach, or even a close approximation, is not a trivial undertaking. Further, some modifications may be necessary to this approach to ensure that isolated data points remote from large clusters of data are still displayed, even when the user zooms out from the display.