1. Field of the Invention
There are two primary methods for data accumulation and aggregation. The first is the creation of a centralized data source all users contribute to and all users draw from. One liability of this model, often called an “enterprise” model, is that it requires considerable energy and effort across an entire enterprise to set up forms and formats and determine the fields and data to include. A second liability of the enterprise model is that reporting from an enterprise data source can require many hours of setup and configuration by a centralized IT function. A third disadvantage is that administration is centralized and adding new users and rights can be burdensome or require extra IT resources. The major drawback for enterprise models is rigidity. The major advantage is data integrity and consistency. Another feature of centralized data sources that has been shifting from an advantage to a disadvantage is centralized system security. There is an increasingly strong argument to be made for decentralized, distributed, and redundant data source storage and management.
The second primary method for data accumulation and aggregation is decentralized flexible organic accretion and accumulation of data into temporary or permanent federated data sources. The liabilities of a decentralized method are two-fold. One is low system security unless users access the system through well designed authentication and rights management routines or virtual private networks. The second liability is that the data is in inconsistent formats and structures, so linking one set of data to another may require reconfiguration of one of the sources to match another. Where enterprise data sources can consistently name and tag fields and require that data be input in specified formats or masks, decentralized data management can evolve multiple labels masks and formats for fields; thus making correct assignment, allocation, and linking data with other data sources impractical or time consuming. Overcoming these liabilities, however, will enable much richer exchanges of data across and between data sources.
As the Internet has expanded and enabled users to link to one another through mobile and other devices, the use of a centralized data server has shifted toward decentralized use of a combination of many devices that run computer readable code, each device being capable of operating as a server. As memory capacity in smaller devices has expanded, the need to centralize large data sources on servers with sufficient memory capacity has also decreased. This decentralization of servers and access to servers of various types across business enterprises introduces the possibility for users of computer processing resources to choose to operate individually or in concert. The proliferation of servers that house disconnected data sources and the emergence of cloud computing has also introduced the possibility of combining centralized and decentralized data sources on servers that can exchange data and update one another in real time. The focus of many business methods and systems prior to this patent has been on methods to enable larger enterprises to leverage access to a central data source. Little has been developed to enable owners or originators of small data sources to integrate their data into an organic federated data source. If a data originator has created value through accumulation of data, there is not an easy system or method for receiving payment or other consideration for that value. Peer to peer and other methods of integration and payment or consideration for data from mobile devices and other servers that link data sources into a data supply chain can provide an impetus for advancing an information economy.
Data sources themselves are expanding. The dissemination of social networking, blogging, twittering and other cumulative and accretive venues for data sources in various forms has exploded the volume of available data as well as the forms and formats in which this data is stored. Mining and analyzing data stored in these widely distributed and varied formats could be a tremendous advantage for commercial and for homeland security purposes. Determining who owns the data and how the data should be accessed may sometimes be difficult, but the capacity to do so is essential. While some of these data sources are in the public domain, for other data sources, determining fair market value for data exchange or use is necessary to facilitate exchange and fair use. This invention will facilitate that process.
Methods for tracking access to data sources and server usage and charging for use of server resources or access to data sources have varied. Some of these methods include offering software as a service, calculating the number of data fields accessed, tracking the number of server operations or actions, measuring the Internet bandwidth required, or combinations of the above. As cloud computing and other distributed and multiple sourced data storage capabilities have evolved, an unintended consequence of this decentralization has been the fragmentation of data sources themselves. Many users own data sources useful for their own small subset of business functions or research or logistics assignments, but their data sources are decoupled from other data sources. Information in large public domain data sources such as those provided by the Center for Disease Control or DARPA could be useful for owners of small data sources to compare or contrast with their data source as part of their research or analysis efforts. The field of medical records is a good example of how many data sources of very similar data can remain decoupled and useless for broader research or for use by medical providers in different locations or organizations or disciplines.
Search engines often yield results that are irrelevant or badly formed or inaccurate or from unreliable sources. Users of a data source must vet the data source itself and then clean, organize, attribute, sometimes pay for, and otherwise process the data. Even within a business enterprise, searches of internal data sources may exhibit similar liabilities to data obtained through search engines. Search engines build an index of data sources in order to structure and manage their web crawling and the information they return through their algorithms. In the early days of search engines, web pages were manually registered with the search engine. The advantage of that early model was user control over information about the page itself and structured information about content and potential uses for the page. The convenience and efficiency of web crawlers from major search engine vendors like Google and Yahoo has reduced the use of user driven registration of web pages as a method for populating search indexes. Encouraging or offering an opportunity for active user driven indexing as well as for use of multiple search engines and multiple methods for parsing data sources themselves can help make data sources more available, more targeted, and more useful. Indeed, the need to identify data sources that contain data relevant to a data aggregator or federator is constant and pressing and many formal and informal methods have been developed by researchers and other data aggregators to enable this to occur. This invention, while it uses search engines as an example of a method for identifying relevant data sources, is focused at leveraging every available method or mechanism that identifies relevant data sources for inclusion into a data supply chain or federated data source.
While there may not be a need to charge within an enterprise for access to data, there is an advantage to tracking which users are updating or drawing down data regularly and what data is changing. When data is owned by someone external to an enterprise that could be useful for that enterprise, there is often need to identify and credit the owner, request permission to access and use the data source, perhaps pay for the use of the data, engage in reciprocal sharing of data, or other exchanges. Collaboration among originators of unique data sources is a major hurdle to data users. This invention relates a business method and process for managing the federation, aggregation, and accumulation of data obtained through web searches, and for offering terms or fees or other consideration(s) for access to the data source and use of the data.
2. Description of the Related Art
There is a need to aggregate data and associate data into larger or differently configured or more useful data sources in order for users or owners of data sources to leverage efficiencies and use data for research or risk mitigation or to improve business processes. Spreadsheets and databases enable users to build an almost unlimited number of data sources with their associated formats, field masks, and other features. There are many variations for creating a data source. The most common is the use of computer readable code to produce a spreadsheet or similar tabular structure with data arranged by labeled rows and labeled columns. When data is exported from one data source to another, frequently the conversion is from one tabular format to another tabular format that can be read by or imported into the second data source. SPSS, SAP, Oracle, SAS and other vendors of common formats for data structure will often offer users a set of types of tables and formats to import or to export. Indeed, one of the hurdles for researchers is often converting and importing of data sources from prior researchers into their own data source. Many tables with labeled rows or labeled columns remain isolated from complex business processes because they are never associated with a data source upon which operations are run or from which analyses or reports are drawn or derived. These local data sources may be very useful to their originators to track or monitor small components of business processes and their originators are often motivated to maintain and update them because tracking this data is intrinsic to their daily work process and they have a need to update the data as information available to them changes. This issue of integration and correlation of data sources also applies to non-tabular formats such as documents. Data from formats that are non-tabular usually require extraction of the data from the data source and posting it into a tabular structure. Regardless of format, a data source may hold a value to someone other than an originator and enabling payment or consideration for use or inclusion of that data into a federated data source or data supply chain would be advantageous.
Smith's (U.S. Pat. No. 7,860,760) data supply chain patent teaches how to fold a single data source into a linked set of data sources controlled by an end user. Smith (U.S. Pat. No. 7,860,760) also teaches how to price data per field to include into a data supply chain. However, Smith (U.S. Pat. No. 7,860,760) does not teach how to manage and evolve a data supply chain through exchange or dialog with owners of data sources including extending terms for use of a data source and access to a server housing a data. Smith (U.S. Pat. No. 7,860,760) does not teach how to implement a business method using search engines and search terms and integrating search results into a data supply chain. Oddo (U.S. Pat. No. 7,222,090) teaches a method for using key words to parse an HTML page and copy the preceding string. However, Oddo (U.S. Pat. No. 7,222,090) does not address data field types or masks, but focuses on products used in inventory lists rather than data in multiple formats. Franco (U.S. Pat. No. 7,725,366) teaches a supply-chain management system that has a plurality of participants and a focus on just in time management, but the method described is unrelated to data management. Pienkos (U.S. Pat. No. 7,272,572) teaches a handshake style method for agreeing to the exchange of intellectual property where an offer and consideration is extended for agreement. Pienkos (U.S. Pat. No. 7,272,572) further teaches the drawing down of ownership information of the intellectual property from the server housing the intellectual property. The method taught by Pienkos (U.S. Pat. No. 7,272,572) does not include use of search terms and establishes only a single transaction, not a continuous relationship between servers housing or drawing down data sources. Additionally the method taught by Pienkos (U.S. Pat. No. 7,272,572) assumes only transfer of ownership, not an exchange or reciprocal relationship. Strickland (U.S. Pat. No. 7,844,549) teaches a method for exchange in a peer to peer format though it is based on confirming legitimate licensing of the content accessed rather than ownership or copyright. Strickland (U.S. Pat. No. 7,844,549) teaches a complex assignments of credits and debits and also of direct payment, but there is no consideration of continuity or updating of data sources as they are folded into a data supply chain. Yeager (U.S. Pat. No. 7,328,243) teaches use of mobile devices in a peer to peer network to compare and track versions of documents, but does not extend the premise into all data sources linked into a data supply chain and therefore offers no method or process for searching for appropriate data sources.
This method will enable a user to use a search engine or other methods and tools to identify relevant data sources to fold into a data supply chain. Sorting through data sources returned by a search engine or through referral or other sources to determine whether they merit inclusion into a data supply chain and notifying the owner of the data source of interest in the data and extending a request for permission to use the data and then offering a fee or other terms for use of data will facilitate a free market among data source owners, producers, and consumers.