1. Field of the Invention
This invention relates to systems and methods for setting prices for sale or purchase of data items.
2. Description of the Related Art
Many users of electronic devices create or produce data in the course of their daily work. Some electronic devices also automatically create and log or store data as they perform functions intrinsic to their use and design. A problem for consumers of data, such as data aggregators, federators, data warehouses, researchers, brokers, and resellers results from data producers or the computer readable code implemented on devices housing data or generating data configuring, entering, and storing data in a multiplicity of formats. Incompatible formats require the purchaser or consumer of data to reformat it in order to enable it to be federated or aggregated into a larger dataset for research, analysis, repurposing, or reuse. Because data is stored in many different formats, aggregators and federators of data have attempted to implement data structure standards from the top down to force data producers or creators to accumulate and post data in standardized or preferred formats. Business and professional groups generate a multiplicity of consortia to set data formatting, structuring, tagging and labeling standards. Business groups and national and state agencies lobby legislators and often are able to get legislation in place requiring people and organizations that exchange data to provide the data in specific formats. There are currently hundreds of separate data structure standards for data sources. These “top down” approaches require coordination and regulation and, even when they are well conceived, do not necessarily motivate producers of data to expend the effort to shape their data according to the standard. The time and energy and complexity of processing required to reformat the data, often called “data wrangling” or “data transformation” is both expensive and prone to error. The need to wrangle data reduces the ability of a data warehouse, aggregator or federator to leverage resources and increases the cost of data processing and other uses and applications of data. Data that might be useful for research may never be part of an aggregated or federated dataset due to incompatibility of data formats. Tools and systems that are used to do these transformations or translations are often called “middleware.”
For persons who enter, format, configure, collect, post to a dataset or database, distribute, or sell data; undertaking the reformatting and conversion of their data can be a significant hurdle. They will sometimes forego the potential benefit from selling their data to avoid the labor involved in reformatting their data. Some enlist or purchase services from one of the many businesses and consulting firms that have emerged to facilitate the transformation of data into alternate formats. Others will use one or another middleware software program to convert their data into an alternate format. These organizations and software programs function much like foreign language translators or translation tools to translate from one vocabulary and grammar to another. Undertaking a translation of datasets or data items can be similar to undertaking a translation of a document in a foreign language; the resulting translation is prone to contain errors because idioms and dialects and alternate meanings can confuse even native speakers. These data sources are also often converted in their entirety, not only the specific items in the dataset that may have real value to the purchaser. Data producers may need to remove subsets of data from these datasets, particularly if they contain confidential or protected information, adding yet another step into an already tedious process.
Purchasing data should be as simple and easy as purchasing any other commodity, but the issues described above regarding data formatting also create problems for data pricing. The unit of data that is most relevant for the purchaser is as simple as the one or a plurality of questions that he is posing and the one or a plurality of answers to those questions. Intuitively one might believe that queries that access datasets pose a question (query) and the data that is extracted is the answer. However, this is not actually the case. A question may be embedded within a query, but it is obscured in the complexity of query construction. Furthermore, producers and consumers of data are required to possess specialized expertise and knowledge to design and implement queries. The invention described herein will enable pricing of data through a streamlined pairing of data items to facilitate format matching and exchange of data. It will serve to enable a “bottom up” process that rationalizes and facilitates pricing and data exchange.
When a researcher collects data for research, the necessary and sufficient information to enable posting into a dataset for application of research, analysis, or further processing is contained in a pair of data items from a dataset. The first item in the data item pair is the data point or target and the second item in the data item pair is the observation upon the data point or target or object that constitutes the research information. Target plus observation is the universal minimal requirement for utility of shared data. In a research context, one can think of the target data item as the “Question” and the observation data item as the “Answer.” In effect, this is the “necessary and sufficient” criteria for rational valuation of data. Data is of value if it provides answers to questions. Pricing for data should fundamentally reflect the value of this paired information; the “question” and the “answer.” Other variables that may affect pricing, aside from the importance value and significance of the question are accuracy, rarity, and utility of the answer. These pricing variables, with the paired question and answer, reflect the “supply and demand” equation common to all commodity markets.
Each item in a data item pair has at least two attributes, an object name and a data type. In most instances the data point or target is a text string such as the name of element or the product or the person serving as the target for an observation, whereas the observation often is an integer or value, but may also be a text string or a date or other data type. Within data types are variations that are usually called “masks” that reflect the domain for the data type. For example, the integer data type might represent a number of Dollars or a number of Dimes and therefore the masks might require differing decimal placements. These units are an external or context concern and the domain expert will, in most instances, be aware of the context for the data point that serves as the “Answer.” A series of these pairs can readily be shaped through computer readable code for posting into a table or other typical data structure. In the early days of data transfer, this fundamental pairing of the target data item with the observation was stored or posted in simple tables, but the computer readable code that drove data collection began to add layers of complexity to the tables, resulting in databases and so forth. This natural and understandable trend, however, induced ever larger data collections under the assumptions that the producer was also the consumer. For a data exchange market to operate efficiently, the data to be exchanged needs to be as parsimonious and singular as possible for market pricing, to limit the bandwidth required, and to increase the relevance and immediate utility of the data.
The invention herein pairs a question with an answer, or an hypothesis with an experimental observation, or a research target with an observation upon that target to enable two parties to exchange the information and set a value for it.