1. Field of the Invention
The present invention relates generally to searching application data. More particularly, the present invention relates to an interface for crawling structured application data.
2. Description of the Related Art
As the use of networks expands, the use of enterprise applications is becoming more prevalent. An enterprise application is generally a software application hosted on a server which has the capability of simultaneously providing services to a large number of users on a network. Often, an enterprise application is suitable for performing business-related functions. Business-related functions may include, but are not limited to, tracking customer information, accounting, and production scheduling.
It is desirable to search for information that may be stored in or otherwise associated with applications or enterprise applications. The current methods of searching data require a search engine to first collect the content from diverse sources and then to text-index the content. The process of collecting the content is known as crawling. The structure of application data poses numerous challenges for crawling and indexing, and later searching the data. Application data is often times highly structured, for example, a single business object may span multiple tables in a database. Current methods of sourcing structured data include a search engine crawler plug-in. The crawler plug-in is designed to fetch documents of a data source type that is not supported by any of the search engine defined data source types. To create a plug-in, the user must become familiar with the architecture of the search engine crawler and the crawler plug-in, and decide upon an appropriate data source model, which sets out the attributes that are to be extracted. The user then programs the crawler plug-in to implement the data source model.
However, the plug-in framework is not a satisfactory solution for enabling the search of highly structured data. The plug-in framework is business object-dependent. Essentially, the plug-in code is confined to extract the attributes of a particular data source model associated with a business object. As such, there is a significant amount of overhead in the individual implementation of each business object. Where the number of business objects to be crawled, indexed, and searched is large, individual implementation may not be a viable solution. Hence, although information is likely to be successfully crawled using the plug-in framework, the steps associated with creating the plug-in may be complicated and time-consuming. Additionally, the crawler is fixed to the business object structure defined at the time of creation of the crawler. When there are subsequent changes to the structure of the business object and the crawler is not aware of the modification, the crawler is inoperable to crawl the business object.
Moreover, each plug-in is search engine-dependent. The crawler plug-in code must conform to the APIs (application programming interfaces) of the particular search engine. In order to implement the search capabilities using different search engines for an applications suite, the user must become familiar with the architecture of each search engine crawler and each search engine plug-in, and must write the plug-in code to comply with the APIs of each respective search engine.
Current solutions are not accessible for all types of applications. For example, plug-ins are dependent upon the applications for which they are targeted to access. Every application, or suite of applications, requires a plug-in to rely on the particular library files for the application. If a plug-in relies on the library of one application, the same plug-in may not be operable to crawl another application without experiencing significant deployment difficulty in merging the disparate applications.
Furthermore, the plug-in framework requires the internal structure of the business object and the execution context to be exposed to the search engine. The internal structure of enterprise applications may be proprietary. Accessing the highly structured application data, where the application data is proprietary, normally occurs through the Java Data Base Connectivity (JDBC) Application Programming Interface (API). A JDBC API is a standard Structured Query Language (SQL) database access interface on which crawlers may be based. Under this methodology, the crawler plug-in includes an SQL query or form-based query to retrieve the required data from the database. When the SQL statement is executed, the proprietary application data is exposed. Accordingly, through the JDBC connection, all of the proprietary data is available to be retrieved. This type of exposure could have serious implications on the application architecture with regard to deployments, dependency, performance, and usability. Although enterprise search engines have implemented various security policies, there are no solutions which control the exposure of application data on the applications side.