With the development of the big data computing technology, applications based on data processing has gained widespread attention. The structures of data sources show a trend of diversification. The data generated by the data sources not only contain the traditional non-real-time, static structural data, but also many real-time, dynamically generated unstructured data streams. A real-time computing technology for the distributed streaming data is needed to acquire the important information carried by the continuously arriving unstructured data sequences.
Currently, commonly used frameworks to process streaming data include the Storm framework and the Spark framework. The advanced encapsulation of the presentation layer of the distributed real-time computing has the corresponding Storm native interfaces and Resilient Distributed Datasets (RDD). Here, the Storm native interfaces have an inferior encapsulation and interface abstraction, a relatively inconvenient usability of the interfaces, a quite complex implementation of the time window and a relatively poor code reusability. Using the RDD interfaces has the issues of mandatory specifying driving the time window by the data inflow time, not supporting nesting, being unable to copy the codes, and the batch and streaming computing being unable to guarantee interface compatibility, and the like.