1. Technical Field
The invention relates generally to communication network systems, and more specifically, to a method, system and computer medium for locating, extracting and transforming data from unrelated relational network data sources into an integrated format that may be universally addressed and viewed over network systems according to a hierarchical representation.
2. Description of the Related Art
There are conventionally-known ways of indexing and addressing information on the Internet (also referred to interchangeably as the “Net”) using an Internet directory. An Internet directory is an application service that generally performs information retrieval based on properties associated with the data of interest. Internet directories can store various types of objects, wherein each object is associated with a type of property or characteristic. For example, one type of Internet directory that provides a standard way of indexing and addressing the computer servers that host Net sites is the Domain Name System (DNS). Typically, a DNS server includes a method of creating a symbolic name for an Internet Protocol numeric address associated with the hardware of the Net server, and provides the .com, .net, .org, etc., domain addresses.
Along with DNS, users are additionally able to determine an address for documents through the HyperText Transfer Protocol (HTTP) that provides a Uniform Resource Locator (URL) for a page formatted with HyperText Markup Language (HTML). This addressing technique provides users a way to access any web page in the world. Although this addressing scheme has worked well to provide a hierarchical addressing scheme during the initial growth of the worldwide web (Web), the amount and importance of the data continues to expand. In particular, the increasing amounts and wide-spread diversity of information that relates to a significant portion of the world's economy is based on critical data records inside databases. Yet, there is no simple and effective manner in which to address and reference such data records originating from diverse heterogeneous databases according to context. For example, there is no conventional standard URL for a sales total, inventory, or a customer record in a database. Accordingly, there is growing need to reach a finer level of granularity of data addressing and management.
A new level of “granularity” is needed in order to locate and distribute information that is increasingly fragmented in its locale, but that potentially gives rise to value-added benefits when integrated with information from other sources. The evolution of the Internet has created an entirely new set of challenges that include dealing with the millions of web sites, billion of documents and trillions of objects that are now available in an increasingly decentralized computer environment. A completely decentralized Net creates a critical need to categorize (i.e., index) information and provide an address (i.e., location) for each piece of data on the Net. If this does not occur, the Net becomes something like a large telephone system without a telephone directory to look-up and to locate the numbers of individuals and groups. While developers have standardized techniques to organize and communicate much of this information through the conventional indexing techniques described above, they have not adequately addressed the following problems.
In the past, conventional client-server computing was inward-focused and directed to a tightly controlled environment. More specifically, conventional client-server computing was developed for distributed networks, and in particular, for use inside an enterprise or organization. Frequently, many enterprises store their data in a collection of disparate databases and deploy applications based on their short-term departmental needs. This conventional approach becomes increasingly problematic as an enterprise grows and the information contained in these disparate databases become increasingly difficult to integrate. The narrow scope of each application can eventually become a hindrance to the overall needs of the organization as information databases grow and change along with the evolving state of the enterprise.
The difficulties of the inward-focused model are more clearly understood when considered in the context of the future growth pertaining to the Net-based economy, which explodes the conventional inward-focused model into an environment that is highly decentralized and far more open to outward-focused computing. One key problem confronting enterprises that attempt to migrate their businesses onto the Net is how to take advantage of existing lines of business applications that are still bound to the inward-focused client-server model. As such, it would be beneficial to provide enterprises and organizations experiencing this problem with a way to unlock their data for use by other applications and other users. By doing so, these “back office” applications do not risk becoming isolated “islands of automation” in an endless ocean of information. Accordingly, it would be beneficial to be able to access and selectively assemble such data from disparately-located data sources and to automatically manage the data with an integrated view of the network and the application infrastructure. What is needed is an efficient integrated solution to a fragmented and distributed enterprise information system.
Directory services are an established component of the network infrastructure, stemming from the Internet's DNS to electronic mail (email) systems, and to the Operating System (OS) domains of corporate intranets. Applications that can leverage the strength of this infrastructure are on the rise and are placing new demands on the directory architecture. Led by the dramatic growth of e-commerce, it would be desirable to move directory-enabled applications toward a model of centralizing administration. This aspect of centralized administration is beneficial because it would allow tasks to be administered from anywhere in a network. To this end, directory-enabled applications moving towards a model having centralized administration would be better-suited to enable access to a richer set of data than provided by conventional directories.
However, for corporate information technology (IT) staff deploying directories in the past, the process has often proven to be slow and expensive. Conventional Internet directory deployment is slow because the process is complicated, at least for several reasons. First, conventional Internet directories suffer from the “yet another database” syndrome. Because the source of the directory information frequently exists in other parts of the infrastructure, the issues of resolving authoritative ownership of the data can be problematic. Second, the inconsistency amongst the various data sources conventionally require reconciling the different data formats and data models associated with each disparate data source. Third, synchronizing data from disparate sources into the directory requires extensive and careful planning.
These complexities in turn result in higher costs, which is another problem typically experienced with conventional Internet directory deployment. Interestingly, a leading directory market research firm (e.g., the Burton Group) has estimated that a typical enterprise directory might take a year to deploy and cost up to $2 Million.
The LightWeight Directory Access Protocol (LDAP) is a standard directory protocol that can be used to establish a universal addressing scheme. However, the complexity of deploying LDAP alone is a drawback holding back the development of such an addressing scheme as discussed below. LDAP is an open Internet standard addressing scheme for accessing directories that has been adopted by the Internet Engineering Task Force (IETF) standards regulation organization as well as by leading developers in the computing industry. Generally, LDAP is a type of Internet directory service based on the International Telecommunications Union (ITU) X.500 series of recommendations, and which facilitates property-based information retrieval by using one or more Internet transports as a native means for establishing communication between client and server computers. In particular, LDAP is an object-oriented protocol enabling a client to send a message to a server and to receive a response. The server typically maintains a directory of object entries, and the message sent from the client can request that the server add an object entry to the directory. Those skilled in the art will recognize that adding an object to a directory is accomplished by instantiating the object. The data model associated with LDAP includes entries, each of which has information (e.g., attributes) pertaining to an object. The entries can be represented by a hierarchical tree structure. A third version of LDAP known by those skilled in the art to be defined in RFC 2251.
Although LDAP can be used to enable queries and updates to be made to a directory structure, the LDAP implementation alone does not and has not conventionally provided a reliable and scaleable enterprise directory primarily because recursive inquiries are required to accommodate the disparate syntax and semantics used by various database providers. The recursive inquiries involve re-synchronizing information existing in unrelated data sources on an ongoing basis due to the incompatibilities introduced by the disparate data models of each data source. Furthermore, as the number of records in the relational table increases, the need for additional recursive inquiries impedes the reliability, efficiency and scalability of the directory.
In order to take advantage of the features of an LDAP directory, this directory must be first created and populated. Since most of the data that would become the source for this directory resides essentially in RDBMS, the complexity of converting the relational data model to the hierarchical data model is problematic. Conventional directory technology can be built on top of an RDBMS engine, but the internal logic and data model of an LDAP directory is so different from an RDBMS, that this conversion is always required. The internal logic of the RDBMS is typically irrelevant from the perspective of the directory, since the entire schema and organization of the directory is based on LDAP, which is modeled as an object-oriented database with inheritance, object class, attributes, and entries. This difference in data representation and data model is problematic because it forces the directory-implementer through a complex and lengthy data modeling and conversion effort. For example, in conventional directory implementations, the data that resides in the RDBMS must be extracted, and converted into a different information model and format (e.g., LDIF as is known in the art) as an intermediate form, and then imported into an LDAP-based directory. To maintain current information in the directory, this process must be repeated on a regular basis, which brings about re-synchronization.
There are other problems associated with this conventional process. First, translating RDBMS logic into an LDAP-based directory is not a lossless process. For example, data types commonly used by RDBMS applications do not exist in the LDAP model. Such data types include, but are not limited to, date and floating-number fields. Some requirements from LDAP do not correspond an exact translation in RDBMS, like for instance, multivalue attributes. Additionally, the lack of transaction support afforded by LDAP directories means that the success of between “batched import” are not always guaranteed.
The LDAP directories are based on a domain- and attribute-oriented data model, while RDBMS are based on an entity- and relationship-oriented data model. From a theoretical perspective, it can be shown that the two models are equivalent in expressiveness as is understood by those skilled in the art of data modeling. For example, one piece of information represented in one model may be translated without loss into the other model. However, conventional directory implementations have not successfully realized a full implementation of the features of the domain and attribute data model, hence, destroying the possibility for lossless automatic translation from one data model to another.
The consequence of having mismatched data models also results in lengthy and costly deployment for an essential infrastructure function. Nevertheless, LDAP is beneficial for several reasons. For example, LDAP is well-suited for use with directories, as compared to databases, particularly for enabling ubiquitous look-up over a network. Also, the LDAP API is also supported by many conventional client computers having, for example, email or web browser functionality, that virtually any user connected to a network may gain access to directories given the appropriate security clearance. Although the database access API structured query language (SQL) provides rich access capabilities when the data is needed locally, it alone inadequately provides secure data access over a network. In order to provide network access to database data, application programmers must use vendor-specific software drivers to enable secure data access over a network.
Accordingly, there is a need for the deployment of Internet directory services that follows a simpler and more flexible approach with consideration that a significant hurdle to overcome entails the mismatch between the hierarchical data structure of a directory and the more complex relational data models supported by the databases that house the data needed for the directory. What is needed is a way to unite “back office” applications (i.e., those applications distinctive to an enterprise and its corresponding proprietary syntax, semantics, logical information modeling, physical data modeling and other mechanisms) so as to seamlessly gain access to data from these divergent sources, and to integrate the data for value-added applications over computer networks outside each of the specific enterprises. Additionally, it is desirable to provide directory-enabled applications that rely upon a model of centralized administration. By doing so, the directory-enabled applications would allow the inclusion of richer, more complex data and data relationships in the directory than has been conventionally known. It would be beneficial if there were a standard addressing scheme for indexing each data record on the Net. With such a universal addressing scheme, a finer level of granularity of data addressing and management can be achieved, thereby enabling end-users improved access to data content.