The present invention relates to data ingestion, and more specifically to dynamically performing data ingestion process.
Data ingestion is a process of obtaining, importing, and processing data for later use or storage in a database. The process often involves altering individual files by editing their content and/or formatting them to fit into a larger document. In other words, data ingestion typically prepares data for doing analytics on it. The operation often involves a sequence of processes to be performed. The operation transforms data to a format consistent with the format of the database storing the data for analytics purposes. During data ingestion processing, data is read to be parsed, formatted and loaded from source systems for storing into a storage device, such as database. Each of these processes during data ingestion operation typically require huge consumption of computational-related resources of the devices/tools involved in execution.
In some computing scenarios, a server may perform the operation on behalf of a client or server may assign some of the operation to be carried out by client itself. In some scenarios, server may notify client of an operation result, and in other scenarios, server may generate a result data set to be stored in the database. Still, in some scenarios, the server may direct the client to perform all the processes at the client location before transferring the data for storage in the database.
Data load tools are typically deployed on either client (source) or server (target) for execution during data ingestion. The server requests data load tools to perform predefined assigned tasks (workloads) at fixed locations. At times, server may require the tools to perform some processes such as reading, parsing and conversion on client, followed by data insertion at server location or may require all processes of reading, parsing, conversion and insertion to be executed at server itself. Due to the fact that the location (also called computational resource set) for data ingestion is fixed, data load tools, on either client or server, keep performing loading operations without consideration of available system resources.