Data that can be entered and stored in a database and then retrieved by performing a query with what is known as “structured query language” (SQL) is generally known as “structured” data. In contrast, most text documents are known as “unstructured” data, data that cannot fit easily into a structured or table format.
This lack of structure makes some special types of documents, such as resumes and job postings, not easily searchable in their original format as unstructured data unless information in the documents are extracted or entered into a database.
A major problem with such structured-data based approaches is that they require much human efforts in extracting data from the documents and then entering data into the database, either by job seekers or by job service providers.
Another major problem is that when searching a resume database after receiving a job description, a recruiter need to spend time digesting the content of the document to accurately identify the key requirements, and then distill such requirements into a few query terms. It is a process that requires much training and experience, as well as much effort for each query.
On the other hand, while document-index-based search engines work well for unstructured data such as web pages or emails, without first converting the documents into structured data formats, they fall short for the purpose of searching special documents such as resumes or job postings, etc.
Conventional document search methods are mostly keyword-based. The query box can only accept a few keywords. However, when a user needs to search special types of documents such as resumes or job postings, or other unstructured data sources, which contain comprehensive information and various ways of representing information, a query that contains only a few keywords usually cannot adequately represent the topics and the scope of the needed information.
More effective methods for searching unstructured data are needed.