1. Field
Embodiments of the invention relate to the field of database searching; and more specifically, to the searching of a hierarchical database and an unstructured database with a single search query.
2. Background
Data may be stored in numerous fashions both unstructured and structured. The term “structured data” is used to refer to data that has some structure associated with the data. For example, a relational database contains structured data as the data within the relational database is structured into tables, columns, and rows. Typically, searching structured data requires knowledge of the underlying structure. For example in the case of the relational database, searching the relational database requires knowledge of the table names. Additionally, searching a relational database requires knowledge of a rigid searching syntax, such as SQL.
Structured data may also be stored in a hierarchical database. The hierarchical database can be a tree, where each data element can be considered a node of the tree. Similarly as with relational databases, searching the structured data in the hierarchical database requires knowledge of the hierarchical structure (e.g., nodes of the tree) and also requires knowledge of a searching syntax.
The term “unstructured data” is used to refer to data that does not have structure associated with the data. A common example of unstructured data is data stored in virtual documents in an inverted index. The term “virtual document” is used to refer to representation of data as textual data that may be indexed. As the data in an inverted index is unstructured, searching the inverted index typically consists of entering in keywords. The term “keyword” is used to refer to a search string. Thus, unlike searching structured data, searching unstructured data does not require knowledge of a rigid searching syntax. However, a disadvantage of searching unstructured data is that the results may not be accurate as keywords may be shared across numerous data sets.
Relational databases have a limited text searching feature. Relational databases are commonly made up of multiple relations (often called tables), which may or may not be connected. Each relation typically represents a different data domain. For example, one relation may represent product suppliers and another relation may represent clients. In order to maintain the structure of the relations within a search result, text searching is performed on a per relation basis. In other words, as the relations represent different data domains, text searching across the multiple data sets would not result in meaningful results as there would not be an indication of which relation the result belongs to. Thus, prior art relational database text searching has the disadvantage that knowledge of a particular relation is required. Additionally, when there are multiple relations a separate text search must be performed on each relation.
Prior art techniques exist that convert structured data into unstructured data to allow for full text searching. For example, data within a relational database may be converted to a format suitable for unstructured searching (e.g., converted into an inverted index to allow for keyword searching). However, a disadvantage of converting data stored in a structured manner into data stored in an unstructured manner is that while searching may be easier for a user (e.g., the user does need to know the structure or special syntax) the results of the search will not include the associated structure.
Other prior art techniques exist that support keyword based searches in association with manual relational database searching. In these techniques, virtual documents are built from a relational database and are indexed into an inverted index. The virtual documents are associated with relation tuples of the relational database (e.g., by using identifiers). Keyword based searches can be performed on the inverted index where the returned results are the identifiers to the relations matching the search. The returned results may contain multiple identifiers in the case where the keyword search term matches multiple virtual documents, and thus multiple tuples. For each identifier that is returned in the result, a user is required to manually search the relational database relation corresponding to that identifier. Thus, in this prior art technique, the keyword search acts as a hint as to where in the relational database the information is located. However, this prior art technique has the disadvantage that if there are multiple identifiers, the user is required to manually search each tuple for each identifier (i.e., the user must manually form a structured query for each identifier). Additionally, if the identifiers correspond to different relations, the user is required to manually search each relation for each identifier (i.e, the user must manually form a structured query for each identifier).