1. Field
The present invention relates to management and access of digital data, and more specifically, to systems, methods and computer products for conducting data searches in both structured and unstructured data sources.
2. Description of Related Art
Information content in an enterprise can be structured or unstructured. For example, structured content may include data for payroll, sales orders, invoice, customer profiles, or the like. Unstructured content includes items such as emails, reports, web-pages, complaints, information on sales, customers, competitors, products, suppliers and people. Historically, structured and unstructured data management technologies have evolved separately due to the natural separation between these two kinds of information, and because different users tend to access structured data versus unstructured data.
Methodologies used for searching structured data generally do not work well for unstructured data. Similarly, it would be inefficient to apply search methodologies of unstructured data for structured data. For example, a company may wish to use its repository of email communication (unstructured) to discover the identity of any customers from Delhi who have sent threatening emails. A conventional way of doing this would be to search for all emails that have the keyword “threaten,” and then from each returned document, extract information that can help identify the originating customer (e.g. cust-id). This information, in turn, could be used to search the company's customer database (for example a cust-id database) in an effort to output a list of customers who reside in Delhi and who have made threatening complaints. With this convention approach, however, it is very inefficient to discover relationships between structured and unstructured data at query time. The application has to decompose the two queries into subqueries, and then federate it to the different data sources, e.g., SQL subqueries that execute against the structured database, and keyword queries that execute against the unstructured database. Furthermore, query optimization has to be carried by the application, since there may be many different ways in which the two queries can be decomposed and federated.
What is needed is a unified system for querying both structured data content and unstructured data content.