The present application relates to computing and more specifically to software and associated systems and methods for facilitating selectively retrieving and storing and/or processing data in a networked computing environment using concurrent computing processes.
Software for facilitating retrieving or intercepting data and then processing the data is employed in various demanding applications, including big data computing applications, MapReduce jobs for enterprise cloud services, and so on. Such applications often demand efficient mechanisms for enabling selective extraction of data from among plural computing resources and/or data streams of a network, and for processing or handling the extracted data without requiring lengthy, error prone, and costly software development efforts.
Efficient and reliable mechanisms for retrieving or collecting data and enabling selectively writing of portions of the collected data to a target network resource (e.g., database, software process, etc.) are particularly important in enterprise applications, which may implement complex suites of distributed software and data storage mechanisms that may be spread among many servers and server clusters. In such enterprise computing environments, efficient coordination of data retrieval, storage, and processing operations among distributed network resources can be critical for efficient performance of enterprise tasks, including extraction of context information required for informing critical business decisions.
To leverage distributed network resources, a given software job or task may be divided into components, which are then distributed to different computing resources to be worked on concurrently and/or in parallel. Concurrent computing in server clusters often involves use of message brokers (e.g., Apache Kafka, also simply called Kafka herein, or other message brokers such as those that can handle real-time data feeds) to collect data from a data stream, event monitor, database, and/or cache; then queue, and/or register the collected data in accordance with a topic and/or task. Operations may be performed on the data. For example, data matching certain criteria may be written to a particular target data store, program, and/or other mechanism.
In general, networked computing environments often lack efficient mechanisms for facilitating consumption of data from message brokers, e.g., Kafka. Existing mechanisms leveraging Kafka streams often require development of custom scripts (e.g., mapper scripts and reducer scripts used for MapReduce jobs) to manage data extraction from Kafka streams; divide work among potentially uniquely configured servers of a server cluster; rebuild functionality in various network locations for various different targets, and so on
The required hand-coding of substantial overhead code may complicate implementation of distributed data extraction and processing applications and associated concurrent processes.