Business-to-consumer (“B2C”) emails and similar communications often may follow more structured patterns than person-to-person emails, with many being created automatically using templates. However, these templates or other workflows for creating B2C communications are not typically made available to entities interested in extracting data from these communications. It may be possible to reverse engineer data extraction templates using a corpus of B2C communications without humans having access to the corpus. For example, the corpus of B2C communications first may be grouped automatically into clusters of similarly-structured communications, e.g., without human intervention. Then, a data extraction template may be generated for each cluster, again without human intervention.