The growth and usage of machine generated electronic mail has seemingly become ubiquitous over the last few years. Auto-generated content such as, for example, purchase receipts, order confirmations, travel reservations, events and social notifications, to name just a few examples, are routinely created by commercial companies and organizations, and account for over 90% of the non-spam Web mail traffic. In fact, on a daily basis, such forms of electronic messages (i.e., emails) can amount to billions of messages.
The task of precisely identifying key elements within this form of digital content in a truly scalable manner is of great importance to both users and service providers, and can be leveraged for applications such as ad re-targeting, mail search, and mail summarization.
However, conventional techniques employed by online parties relies on complex clustering mechanisms. This has many technical drawbacks, of which, for example, is the large amount of messages that need to be pre-processed. That is, in order for conventional systems to properly partition and identify key content links, items or portions of messages, these systems need to be trained on large sample sets of messages. This leads to large amounts of system resources and network throughput being wasted by such systems during the pre-processing steps in receiving, accepting or identifying messages, then actually performing the analysis. Such systems are wasting vital network and computing device (e.g. server) resources by requiring any system that desires to perform message extraction to devote large amounts of its processing power and memory resources to the development of the system's capabilities, which leads to a resource drain on the computing devices executing the systems as well as the network infrastructure they are operating on/within.