The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves can also correspond to implementations of the claimed technology.
As the need for real-time big data analytics grows, more data operators will build new big data pipelines or reimplement their existing pipelines in an attempt to improve performance. However, the process of building a big data pipeline poses a number of challenges. Provisioning and operating big data platforms is time consuming and difficult. Engineers with expertise in big data are expensive and rare, and implementing performant and fault-tolerant pipelines is difficult even for experienced engineers. Building custom data pipeline requires hardware, many human resources, and a large time frame. Monitoring, operating, and debugging a pipeline is expensive and difficult, and implementing a pipeline requires many different technologies. Support licenses for many big data technologies are expensive.
To overcome these difficulties, we analyzed a variety of production data pipelines. Our analysis resulted in the discovery that the logic contained in the production data pipelines falls into two categories—(1) generic processing and (2) custom processing. Generic processing includes processing tasks that are commonly performed by most big data operators. Some examples of generic processing tasks are filtering, enrichment, and aggregation. Custom processing executes business logic that is highly specific to a particular entity.
Generic processing is usually performed on large data sets and is difficult to implement in a performant and fault-tolerant manner. Custom processing is done on smaller data sets and has well-defined specifications. Because of the implementation difficulty and pervasiveness of generic processing use cases, it is desirable to implement generic processing use cases as streaming microservices.
An opportunity arises to implement streaming microservices as powerful data processing tools that accelerate application development and deployment, improve performance, and reduce the cost of provisioning and maintaining data pipelines.