Keyword searches are often performed to extract information from collections of data, such as from websites available over the Internet. Traditional keyword searching has focused on keyword searching for information contained in HTML (Hypertext Markup Language) documents. However, due to various limitations associated with using HTML documents, the Extensible Markup Language (XML) data model has been developed. The XML data model allows for extensible element tags that can be arbitrarily nested within XML documents. A challenge associated with keyword searching over XML documents is that a keyword search result may not be the entire XML document, but can be a deeply nested XML element.
To avoid challenges posed by keyword searching in XML documents, the notion of lowest common ancestors (LCAs) has been proposed. However, conventional LCA-based techniques are relatively inefficient.