There are several problems associated with sharing aggregated data in a distributed environment. The primary problems involve: (1) enabling systems to share their “knowledge” of data; (2) enabling storage of data for distribution across the computing environment; and (3) a framework for efficiently creating, persisting, and sharing data across the network. The problem of defining a run-time type system capable of manipulating strongly typed binary information in a distributed environment has been addressed in a previous patent, attached hereto as Appendix 1, hereinafter referred to as the “Types Patent”. The second problem associated with sharing data in a distributed environment is the need for a method for creating and sharing aggregate collections of these typed data objects and the relationships between them. A system and method for achieving this is a ‘flat’, i.e., single contiguous allocation memory model, attached hereto as Appendix 2. This flat model, containing only ‘relative’ references, permits the data to be shared across the network while maintaining the validity of all data cross-references which are now completely independent of the actual data address in computer memory. The final problem that would preferably be addressed by such a system is a framework within which collections of such data can be efficiently created, persisted, and shared across the network. The goal of any system designed to address this problem should be to provide a means for manipulating arbitrary collections of interrelated typed data such that the physical location where the data is ‘stored’ is hidden from the calling code (it may in fact be held in external databases), and whereby collections of such data can be transparently and automatically shared by multiple machines on the network thus inherently supporting data ‘collaboration’ between the various users and processes on the network. Additionally, it should be a primary goal of such a framework that data ‘storage’ be transparently distributed, that is the physical storage of any given collection may be within multiple different containers and may be distributed across many machines on the network while providing the appearance to the user of the access API, of a single logical collection whose size can far exceed available computer memory.
Any system that addresses this problem would preferably support at least three different ‘container’ types within which the collection of data can transparently reside (meaning the caller of the API does not need to know how or where the data is actually stored). The first and most obvious is the simple case where the data resides in computer memory as supported by the ‘flat’ memory model. This container provides maximum efficiency but has the limitation that the collection size cannot exceed the RAM (or virtual) memory available to the process accessing it. Typically on modem computers with 32-bit architectures this puts a limit of around 2-4 GB on the size of a collection. While this is large for many applications, it is woefully inadequate for applications involving massive amounts of data in the terabyte or petabyte range. For this reason, a file-based storage container would preferably be implemented (involving one or more files) such that the user of a collection has only a small stub allocation in memory while all accesses to the bulk of the data in the collection are actually to/from file (possibly memory-cached for efficiency). Because the information in the flat memory model contains only ‘relative’ references, it is equally valid when stored and retrieved from file, and this is an essential feature when implementing ‘shadow’ containers. The file-based approach minimizes the memory footprint necessary for a collection thus allowing a single application to access collections whose total size far exceeds that of physical memory. There is essentially no limit to the size of data that can be manipulated in this manner, however, it generally becomes the case that with such huge data sets, one wants access to, and search of, the data to be a distributed problem, i.e., accomplished via multiple machines in parallel. For this reason, and for reasons of data-sharing and collaboration, a third kind of container, a ‘server-based’ collection would preferably be supported. Other machines on the network may ‘subscribe’ to any previously ‘published’ server-based collection and manipulate it through the identical API, without having to be aware of its possibly distributed server-based nature.