With increasing frequency, information scientists, systems analysts, and other knowledge workers are obliged to quickly acquire broad-based knowledge of unfamiliar information domains, such as old legacy databases. Often knowledge sources are embodied in a stream of open source data, such as a network of World Wide Web sites or a proprietary document repository. When this occurs, the principal impediment to knowledge acquisition can be the sheer volume of textual data that must be navigated.
Absent computational support, workers assigned to such enterprises are obliged to pore over the documents to extract titles, tables of contents, and indices. Once collated and compiled, this information provides an initial map of the knowledge source, at least for those documents with informative titles, tables of contents, and indices. Other documents must be reviewed independently in order to isolate the same kind of information. This laborious process includes, for each document, a survey of its contents, an evaluation of its contents' suitability to the goals of the investigation, and a prioritization of its information content with respect to the growing body of information about the domain. Because the cost of manual intervention is so high and its result so uncertain, teams of domain specialists are usually required to support these knowledge workers and to increase the general likelihood of success for the collective effort. Knowledge acquisition in unfamiliar domains can be expensive and time-consuming. Therefore, there exists a need to provide a tool to efficiently analyze unfamiliar databases.
Also, data designs inevitably fossilize if their dependency architectures must be modified in order to accommodate change. Unsurprisingly, the key indicator of design fossilization is evident when change to the semantics of one data element propagates change to all semantically dependent data elements. This is characteristic of structured design, where change to low-level details impacts the semantics of high-level policy. As the original dependency architecture decomposes, the nature and extent of this relationship becomes less predictable. When the propagation of semantic change cannot be reliably predicted, the cost of design changes cannot be estimated. Eventually, data and process owners become reluctant to authorize changes, with the result that the change proposals are discouraged and deemed suspect. Ultimately, owners freeze the change management process and the design fossilizes, initializing the final stage in the economic lifetime of the data asset.
One diagnostic for fossilized data designs is that such fossiled designs characteristically propagate semantic change imposed on low-level data elements to higher-level dependents. From the perspective of data and process owners, a design change may give rise to problems that impact aspects of the design that have no conceptual relationship with the changed element. For example, the remedy for one problem may lead to other, seemingly unrelated problems. The quality of data for certain aspects of the design is thrown in doubt, and the potential for data reuse diminishes accordingly. In effect, design fossilization erects a firewall of non-reusability around data assets. Certainly this is not the intended effect.
Therefore, there exists a need to forestall the process of design fossilization and facilitate the productive utilization and reuse of data assets.