The exemplary embodiment relates to a process discovery method and system for clustering, modeling, and visualizing process models from noisy logs using non-negative factorization and classification of activity sequences.
By way of background, business process discovery is the next level of understanding in the emerging field of business analytics, which allows organizations to view, analyze and adjust the underlying structure and processes that go into day-to-day operations. The challenges of a discovery procedure include gathering information of all of the components of a business process (technology, people, department procedures, and protocols), capturing concurrency, dealing with noise and incompleteness, and constructing a representation of the observed business processes and their variations. The information gathered enable viewing the causal and dynamic dependencies in processes and organizations, checking the conformance of the discovered processes with the models the organization specified (i.e. detecting good or bad deviations), fixing defects or enhancing process operations.
There are several families of approaches to do the actual discovery, many of which overlap in terms of the techniques used, such as direct algorithmic approaches (the α-algorithm), two-phase approaches (e.g., using hidden Markov Models), computational intelligence approaches (e.g., genetic process mining), etc. Such approaches may work well in specific contexts, but they have drawbacks, such as not dealing with noise and 1 and 2 node cycles, assuming one unique process to discover, producing “lossy” process mappings (that need to be adapted to fit the target language) and being rather slow.
Therefore, there is a need for a robust process discovery method that handles multiple processes in an organization, deals with noise in the process logs, and translates visually its findings. This suggests the need to combine a clustering method and probabilistic representations.