Users may be inundated with numerous business-to-consumer (“B2C”) emails and similar communications that inform the user of a variety of information, such as travel itineraries, bills due, upcoming events, and so forth. If the user fails to set reminders, create calendar entries, or take other similar actions in response to receiving such communications, the user may, for instance, miss a meeting, fail to pay a bill, miss a flight, and so forth. Additionally, various data points in the communications that may be immediately relevant to a user, such as information related to an upcoming or current trip (e.g., flight information, hotel reservation, event/venue information, etc.), may be scattered across multiple different communications, and may be difficult for the user to track down.
Data contained in B2C communications and other similar documents (more generally referred to herein as “structured documents”) may often follow more structured patterns than person-to-person communications, and often are created automatically using templates. Such templates may be useful for extracting pertinent data points, such as departure time, event location, invoice due dates, etc. However, these templates are not typically made available to entities interested in extracting data from these communications. It may be possible to reverse engineer these templates, e.g., using various parsers and/or heuristics that may require some level of human intervention, in order to generate data extraction templates configured to extract relevant data points for presentation to the user. However, given the ever-changing content and layout of B2C communications, reverse engineering data extraction templates manually may become impractical.