Document archives often include instances of the same underlying forms. Examples of such instances include standard governmental documents of a certain country used during a specific time range in the past (e.g., 1930's German birth certificates). Because the underlying form is the same across different instances, knowledge of the form can be exploited when extracting information from the document archive via processing of the forms contained therein. However, there is no known approach that can effectively identify instances of the same forms in a large document collection. Accordingly, there is a need for a way to automatically recognize instances of the same forms in a large document collection.