The present invention relates generally to systems and methods for locating and accessing electronic content, and more particularly to systems and methods for enabling secure querying across enterprise and other such systems.
A common approach to searching and indexing content, particularly across the World Wide Web, is referred to as “crawling.” In order to perform such crawling, a program, script, or module known as a crawler or spider is used to scan publicly available information across the Web. Several search engines use crawling to provide links to data available across the Web, as well as to provide a synopsis of the content available at those links so a user can make a determination of the relevance of each of the links displayed to a user in response to a user typing in a query, typically in the form of keywords entered into a search box in a search page or toolbar. Web crawlers typically create a copy of each page touched by the crawling, such that a search engine later can index the page copies in order to improve the performance of subsequent searches. Indexing typically creates keyword metadata, such as may be contained within a meta-tag field of the copy of the page, which can be accessed by search engines to more quickly make a determination of the content of a page or site. A search engine then can search the entire content of a page or simply search a keywords field.
A crawler typically accepts as input an initial list of Uniform Resource Locators (URLs) or hyperlinks, often referred to as “seeds” in the crawling process, and examines the content at each linked page to determine any URLs present in that page. These URLs then are added to the “list” to be crawled. By following each additional URL in the list, the number of pages being indexed can grow exponentially. Once a page is identified by a crawler, it will be indexed by a search engine or other appropriate tool and then available for querying or searching.
A limitation on crawling is that different data resources have varying degrees and types of security and access mechanisms. While crawlers can easily provide links to public information, there presently is no way to access a number of disparate systems, such as applications across an enterprise, while ensuring only authorized access to data by authenticated users. For example, a user might wish to search for all information across an enterprise related to a current project, whether that information is in data, email, or file form. This would require accepting and tracking security information for each system or application serving as a data source of these types, such as an email system, a file management system, a database management system, etc. The crawler then would have to be programmed to be aware of all the security requirements of each application or source, be able to authorize and authenticate users, and perform a variety of other tasks that drastically complicate and slow down the crawling process.
The problem is exacerbated when attempting to crawl enterprise applications, such as eBusiness or PeopleSoft applications, as these applications do not have simple user role mapping but instead each have a unique security model. Instead of having a single role (e.g., manager, employee, or administrator) that defines the content accessible to a user, such as may be controlled by username and password, the enterprise application business components can have a variety of different attributes that can specify whether a particular user can see a particular action or document, for example. Further, these attributes may change dynamically such that the user can have access to different content each time the user attempts to execute a query or search. For example, a given document D1 might be accessible to an employee E1, but might also be accessible to each level above E1, such as E1's project managers PM1, PM2, etc. While the security must not only account for this security hierarchy, it must account for the fact that people can move groups or levels in the hierarchy at any time. These hierarchies are also not fixed based solely on position with a company, for example, but can be project-based where the members of a project can change continually. This results in what can be referred to as a dynamic security hierarchy, wherein each user in the dynamic hierarchy can have a unique set of security attributes that can result in different content access at any time. Such dynamic access is far too complicated to fit into any standard user role model.