Data record extraction pertains to extracting data records containing user-generated content (UGC) from web documents. Data record extraction may be useful in web mining applications such as question answering, blog or review mining, expert searching on web communities, etc. For example, a user who is interested in purchasing a new car may use data record extraction techniques to mine customer reviews pertaining to a new car of interest so that the user can make an informed decision on whether to purchase the new car.
In prior implementations, data record extraction techniques generally assume that the data records contain a limited amount of UGC and thus have similar structures. However, due to the free-format nature of UGC, data records containing UGC are generally of unstructured formats.