Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. For example, a machine learning system could be trained on email messages to learn to distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and non-spam folders.
Machine learning typically deals with representation and generalization. Representation of data instances and functions evaluated on these instances are part of all machine learning systems. Generalization is the property that the system will perform well on unseen data instances; the conditions under which this can be guaranteed are a key object of study in the subfield of computational learning theory.
A core objective of a learner is to generalize from its experience. Generalization in this context is the ability of a learning machine to perform accurately on new, unseen examples/tasks after having experienced a learning data set. The training examples come from some generally unknown probability distribution (considered representative of the space of occurrences) and the learner has to build a general model about this space that enables it to produce sufficiently accurate predictions in new cases.
Additional background and details regarding machine learning and associated methods and algorithms may be found in Sebastiani, Fabrizio (2002) Machine learning in automated text categorization, ACM Computing Surveys, 34(1):1-47, which is incorporated by reference herein in its entirety.
To date, knowledge discovery from existing information sources inside an enterprise has been the focus of enterprise dataset platforms and business intelligence platforms. However, the challenge of building context of enterprise data in real-time, e.g., as data is streaming in from end-user applications where the enterprise is selling or providing services through enterprise applications, has long been a challenging and expensive task.
Although relational database systems and business intelligence systems apply online analytical processing techniques to synthesize and present data, a typical enterprise today has large variety of data (documents, transactions, field operations, financial, etc.) in variety of data formats (unstructured, structured) and with high frequency of updates (velocity). Because of the volume, velocity and variety of data that may be received, and the limitations of available data processing and management systems, most enterprises are only able to use, or make sense of, a fraction of the data or other information they receive.
Moreover, available data analysis and processing systems and methods are not capable of interpreting and quickly adapting to a given context or contexts of enterprise data of disparate enterprise areas, or industries. For example, different types of service repair businesses have different types of information concerning problem reporting, diagnoses, repairs, and service recommendations depending on the type of domain involved, e.g., automotive, healthcare, home appliances, information technology, or aeronautics.
Thus, there remains a need for an improved and adaptable system and method for analyzing and processing large volumes of data across different enterprises and for different enterprise applications.