Metadata is simply defined as “data about data”. Metadata typically describes the content, quality, condition, and/or other characteristics of data. The purposes of metadata include assisting users to organize and maintain an organization's or user community's internal or external investment in spatial data, provide information about an organization's or user community's data holdings to data catalogues, clearinghouses, and brokerages, and provide information to process and interpret data received through a transfer from a unique, disparate or federated external source. Such a source can be centralized or distributed.
By 2005, more than fifty percent (50%) of large organizations will have multiple sources of integration technology. As that proliferation occurs, being able to recognize the use of metadata across different deployment platforms becomes extremely important. Coupled with this proliferation, the explosive growth in personal computers (PCs), servers, Internet related software and web-based holdings has cultivated a need for companies to better understand their internal and external data needs. To better understand these needs, many gigabytes of data must be collected and analyzed to arrive at the best way to service the user.
Market and industry analysts alike, believe that the Internet will prove to be the most significant innovation in modern history since the light bulb and automobile. With regard to the communication of consumer related data, the Internet will quickly surpass or encompass traditional radio and television.
The method in which daily business operations are performed will be changed forever due to this new technology. Many technology based companies in the computer industry are scrambling to outline new products and services using and exploiting the Internet as a vehicle to increase market share and revenue, while increasing productivity and cutting operational costs.
In an effort to meet the above needs of ingesting the vast amounts of information on the web, companies have designed many browsers and millions of web pages to access, retrieve and utilize this information. In addition to the Internet, companies have set up local “intranets” for storing and accessing data for running their organizations. However, the sheer amount of available information is posing increasingly more difficult challenges to conventional approaches.
A major difficulty to overcome is that information contained on the web or web pages are often dispersed or distributed across the network at many sites. Networks themselves may be unique, disparate or federated situated in either centralized or distributed environments. It is often time-consuming for a user to visit all these sites. One conventional approach used to access this information more effectively is called a search engine. A search engine is actually a set of programs accessible at a network site within a network, for example a local area network (LAN) at a company or the Internet and World Wide Web. One program, called a “robot” or “spider,” pre-traverses a network in search of documents and builds large index files of keywords found in the documents.
A user of the search engine formulates a query comprising one or more keywords and submits the query to another program of the search engine. In response, the search engine inspects its own index files and displays a list of documents that match the search query, typically as hyperlinks. When a user activates one of the hyperlinks to see the information contained in the document, the user exits the site of the search engine and terminates the search process.
Examples of various search engine methods include:
Brin et al., U.S. Pat. No. 6,678,681, Gomes et al., U.S. Pat. No. 6,615,209, and Bharat et al., U.S. Pat. No. 6,526,440 discloses various search engine strategies and data extraction methods where the database itself is the Internet or network of websites such as the World Wide Web.
Search engines, however, have their drawbacks. For example, a search engine is oriented to discovering textual information only both weighted and non-weighted. In particular, they are not well-suited to indexing information contained in structured or unstructured databases, such as, relational databases, voice related information, audio or video related information, and metadata. Moreover, mixing data from incompatible data sources is difficult in conventional search engines.
Another disadvantage with conventional search engines is that irrelevant information is aggregated with relevant information. For example, it is not uncommon for a search engine on the web to locate hundreds of thousands of documents in response to a single query. Many of those documents are found because they coincidentally include the same keyword in the search query. Sifting through search results in the thousands, however, is a daunting task.
Accordingly, inventors of the present invention have determined that there is a need to be able to effectively collect the data and/or provide useful information indicative of events occurring on the web in a specified format that will speed up the collection of data, identify more clearly what data is required, and capture information about the data to make reporting more accurate. This specified format for collection is also changeable and/or expandable. For example, data which indicates where a user has been in prior sessions may be useful in designing future products accessible via and for the web. The inventors of the present invention have also determined that there is a need for a convergence platform architecture, system and methods to support and analyze Internet, electronic learning and/or electronic commerce data over or from the World Wide Web.
Inventors of the present invention have further determined that there is the need for a convergence platform architecture, system and methods used to correlate user, application, and access functions. Further, it is also determined that there is a need to provide tool sets that can easily communicate with, or become subsets of, an existing scaleable data warehouse to provide Internet decision support, electronic learning and information management. Unfortunately, conventional architectures and/or techniques are unable to organize and present this information in an efficient manner. Many attempts in the prior art include the following:
Ignat, et al., in U.S. Pat. No. 6,611,838 discloses a method of managing metadata via a metadata exchange platform that allows for a synchronization of databases.
Armatis et al., in U.S. Pat. No. 6,697,822 discloses a method to update data files using metadata consisting of unique record identifiers.
Boothby et al., in U.S. Pat. No. 5,684,990 and Pet et al., in U.S. Pat. No. 5,835,912 discloses methods to synchronize update and transfer data and data records of disparate databases.
Noble et al., in U.S. Pat. No. 5,634,053 discloses a method to create a virtual centralized database of a plurality of interconnected local databases.
Dockter et al., in U.S. Pat. No. 5,678,038 discloses the use of database schemas for the management of classification systems.
Lau et al., in U.S. Pat. No. 6,502,098 discloses of transferring data using data table hierarchy.
With the advances in technology and the increase in the number of applications, the definition of “data” and the formats that data is presented or housed in continues to grow. More generally, data is defined as facts represented in a readable language such as numbers, characters, images, or other methods of recording on a durable medium. Data on its own carries no meaning. Empirical data are facts originating in or based on observations or experiences. A database is a store of data concerning a particular domain. Data in a database may be less structured or have weaker semantics (built-in meaning) than knowledge in a knowledge base. Data and data formats include text, graphics, print document formats (PDF), spreadsheets, presentation slides, digital stored video, objects, and digital stored audio to name only a few. Still, further complicating the definition of data is that, at times, can be combinations of the previously stated data and data format.
Due to the increasing complexity of unique, disparate or federated data warehouses in both centralized and distributed environments, a centralized and declarative management of metadata or metadata records is essential for data warehouse administration, maintenance and usage. Metadata is usually divided into technical and semantic data about data. Typically, current approaches, including those technologies in the previously cited U.S. Patents, only support subsets of these metadata types, such as data movement metadata or multidimensional metadata for On-Line Analytical Processing (OLAP).
OLAP is a category of applications and technologies for collecting, managing, processing and presenting multidimensional data for analysis and management purposes.
To further complicate the environment that the prior art fails in, the current marketplace is inundated with proprietary legacy systems, expensive technology and a plethora of point products. Also, the concept of “registries” is not new. Registries range from automotive parts information to registries for worldwide domain name archives. In today's computer world, registries, such as dictionaries and catalogues, have been around for a long time such as Novell's Directory, Microsoft's Active Directory, and IBM mainframe catalogue. In the United States, the Do-Not-Call Registry is perhaps the best known registry today where users can list their telephone numbers and sales groups are required to inspect the list and not call the listed users. However, these registries address different objectives and business problems.
In particular, the interdependencies between technical and semantic metadata have not yet been addressed sufficiently by the prior art. The representation of these interdependencies form an important prerequisite for the translation of queries formulated at the business concept level to executable queries on physical data.
With regard to specific content housed in a plurality of unique, disparate or federated database environments, the prior art fails in that technologies are directed towards content or a mixture of content and metadata requiring dramatic usage of memory space, personnel inputting time and expensive equipment. Prior art systems such as learning content management systems (LCMS), document management systems or content management systems (CMS) all relate to content itself and the content is housed or synchronized in a centralized site.
The present invention provides for interoperability and increases the utilization of the content that metadata is related to. Embodiments of the present invention provide convergence platforms, systems and methods that are based on standards that are relevant to the users providing context to their jobs and organizations.