The disclosure relates to efficient processing of large data sets using parallel and distributed systems. More specifically, the disclosure concerns various aspects of processing of distributed data structures including collaborative processing of distributed data structures using shared documents.
Enterprises produce large amount of data based on their daily activities. This data is stored in a distributed fashion among a large number of computer systems. For example, large amount of information is stored as logs of various systems of the enterprise. Processing such large amount of data to gain meaningful insights into the information describing the enterprise requires large amount of resources. Furthermore, conventional techniques available for processing such large amount of data typically require users to perform cumbersome programming.
Furthermore, users have to deal with complex systems that perform parallel/distributed programming to be able to process such large amount of data. Software developers and programmers (also referred to as data engineers) who are experts at programming and using such complex systems typically do not have the knowledge of a business expert or a data scientist to be able to identify the requirements for the analysis. Nor are the software developers able to analyze the results on their own.
As a result, there is a gap between the process of identifying requirements and analyzing results and the process of programming the parallel/distributed systems to achieve the results. This gap results in time consuming communications between the business experts/data scientists and the data engineers. Data scientists, business experts, as well as data engineers act as resources of an enterprise. As a result the above gap adds significant costs to the process of data analysis. Furthermore, this gap leads to possibilities of errors in the analysis since a data engineer can misinterpret certain requirements and may generate incorrect results. The business experts or the data scientists do not have the time or the expertise to verify the software developed by the developers to verify its accuracy.
Some tools and systems are available to assist data scientists and business experts with the above process of providing requirements and analyzing results of big data analysis. The tools and systems used by data scientists are typically difficult for business experts to use and tools and systems used by business experts are difficult for data scientists to use. This creates another gap between the analysis performed by data scientists and the analysis performed by business experts. Therefore conventional techniques for providing insights into big data stored in distributed systems of an enterprise fail to provide suitable interface for users to analyze the information available in the enterprise.