“Big data” generally refers to a large and diverse data set. Within big data, a variety of data sources produce or provide data, data streams, time-series, and updates thereof, all of which form the data set of big data. While traditional data sources, for example, repositories of structured data can contribute to big data, the particular challenges with big data arise from the unstructured or unforeseeably composed data that flows from relatively new types of data sources, such as Smarter Cities, wearable technologies, social media, on-the-go devices, and the Internet of Things. Thus, big data setup often uses tools and technologies specifically designed for handling, analyzing, and manipulating a wide variety of data from a wide variety of sources without undesirable latency.
Analytics is the science of data analysis. Big data analytics includes tools and techniques designed for use with big data. Big data analytics are used to gain insight into the available data by analyzing available data to create, infer, deduce, or derive new information or knowledge.
A data source, also interchangeably referred to hereinafter as simply a “source”, provides data in some form to a big data configuration. Generally, the data source publishes a set of Application Programming Interface (API) using which data can be obtained from the data source.
An API implements a functionality at a system. An API is code-based tool or method, such as a function call in a programming language, using which the functionality can be activated or operated. In case of a data source, an API allows another system to perform an operation at a system of the data source to obtain certain data from the data source. For example, quite commonly, a data source API has to be used to provide authentication and billing credentials for access to the source's data. Different sources implement different APIs to obtain different parts of their data in different manners, for different purposes, using different protocols, and the like.
An API generally accepts a set of zero or more input parameters. A function or method invoked by calling an API performs a corresponding functionality. The operation of a function or method can result in data manipulation, data output, or both. In case of a data source, an API call generally results in data output—referred to as a result set—from the data source.
When a consumer application needs data that is available at a data source, the consumer application has to be integrated with the data source. The integration is configured to call the correct API of the correct data source with a correct set of parameters, to receive the result set, and to provide the result set to the consumer application.
Many entities have recognized that analyzing the wealth of data available about people, objects, and events can give their businesses an edge. Accordingly, more and more consumer applications—also referred to herein as requestor(s) or requestor application(s)—are seeking out useful data from data sources. Not surprisingly, more and more owners of sources of data are preparing to sell their data to data consumer applications.
The illustrative embodiments recognize that there has been an explosive growth in the number of data sources, the volume of data from these sources, and the number of APIs that must be used to gain access to this volume of data. The illustrative embodiments recognize that even if the APIs are published by a data source, each API of each data source requires some integration effort, and such integration efforts quickly become non-trivial.
The illustrative embodiments recognize that even after such expensive integration efforts, the resulting data from the selected data source may not meet a requestor's requirements. The illustrative embodiments recognize that big data configurations can play an important role in enabling a requestor application to get just the right data from the right combination of sources, according to the requestor's needs.