Any entity that involves interpersonal communications can have a useful need for the storage and analysis of such communications. Businesses regularly log such communications as a conventional business process. Such communications will typically comprise information items such as logs, notes and events from ordinary business and software processing. More particularly, for web transactions and web based activity, such items will comprise of web logs, customer service representative notes, phone call logs, and event logs. Such informational items and contact logs when analyzed can be helpful in generating useful information about the topics or participants to the communications. Prior known methods for such analysis will typically group the informational items into logical groups, i.e., website visits or call center phone calls, and then analyze and act on the groupings rather than on the individual logs themselves. Such groupings make sense when the information content in any single log item is small for conventional and regular document analysis techniques. The subject embodiments provide improved methods and systems for enabling the application of models to informational data from multiple disparate sources for improved logical groupings thereof into analyzable documents.
A particular problem with conventional automated record keeping systems which generate business and software processes logs, notes and key events for maintaining ordinary records for business or legal purposes, is that at the level of a single log the information content is too low to yield useful analysis with common document topical analysis techniques. Known document analysis techniques, such as latent dirichlet allocation (LDA) often operate on documents with larger information content. While LDA clusters on large information sets, it does not inherently provide ways to consider groupings of transactional information with low information content. To take advantage of such analytical techniques, the logs, notes and key events from available data sources need to be grouped into meaningful categories as applicable to a business, organization or workflow. The power and flexibility of LDA is further enhanced after establishing such logical groupings and norms to label these logical groupings.
A particular problem addressed by the subject embodiments is that known systems establish logical groupings for document analysis are based upon forming silos of information from particular information sources. For example, phone call information is treated differently from webserver traffic even though there are techniques and needs to view them in conjunction, for example, as logical information groupings from call and web logs. The clustering of information solely based upon one source, or some other defined silo, restricts the usefulness of the results when analysis techniques such as LDA are applied to the information groupings.