The present invention relates generally to a spreadsheet population method, and more particularly, but not by way of limitation, to a system, method, and computer program product for building a representation of a corpus that can be used to populate a spreadsheet.
Spreadsheets are used throughout the business world for interactive tabular data entry. Conventionally, business users manually often extract information from documents, such as news articles, technical reports, social media posts, and regulations, to populate their spreadsheets. Populating a spreadsheet in this way can be a time consuming task and require additional human resources that increases the cost for a business.
The conventional techniques require manual work to extract and place data in cells that conform to the user's semantics for the columns, rows, and relationships between cells.
For example, one conventional technique is purported to recognize cells that are in a pre-determined category (e.g., city, state, country, and zip code) and uses that category to determine additional types of information (e.g., demographics, population) that is then available for each cell. However, this conventional technique requires a set template with repeated values and does not work with example relationships between cells.
Therefore, a method is needed that populates spreadsheet data using examples of implicit relationships and suggests data from the document corpus that may conform to these implicit relationships.