In the fields of knowledge-engineering, database management, modeling, simulation, and expert systems, one common problem relates to forming valid optimization strategies over domains having constituent data-sets of assorted characters.
In this context, data-sets of assorted characters relates to data-sets that differ with respect to data structure complexity, to data resolution, to data quantification, or to any combination thereof. Data structure complexity, data resolution, and data quantification may each relate to one-dimensional metrics or to multi-parametric characterizations.
In the context of the present document, data structure complexity, hereinafter “complexity”, generally relates to local interconnectivity between a data element being characterized with respect to complexity and other data elements, and similarly global interconnectivity between any data-set, which includes this data element, and other data-sets. For example, a root node in a binary tree locally has two children branches of its own, and similarly may globally have many relationships that relate it to root nodes of other data structures.
In the context of the present document, data resolution, hereinafter “resolution”, generally relates to an embedded relational concept wherein data-sets and proper data subsets are identified. The subset has a higher resolution than the superset, in that detailed data is placed in the subset while overview data is placed in the superset. For example, a superset may be a workflow overview organization, while subsets contain detailed charts of productivity measurements for each station in the workflow process.
In the context of the present document, data quantification, hereinafter “quantification”, generally relates to a common sense notion of measurement precision. For example, in physics or chemistry it is common to measure a phenomenon to some known precision (e.g. velocity at mm/sec or pH to four decimal places), while in market surveys it is common to measure customer satisfaction using perhaps two to five select-only-one categories. While the average for a large number of surveyed customers may reach the same numerical precision as a physical measurement for a perhaps smaller number of samplings, nevertheless common sense still says that the physical measurement is a more realistic quantification than the survey result.
At the present juncture, it is necessary to appreciate that quantification disparities exist, and that known systems' design methodologies encourage relating data-sets of like quantification while they discourage relating data-sets of disparate quantification. Likewise, in a non-systems context, one could internally assign synthetic fractional quantification measures to semantic data-sets, and thereby presumably differentiate between their relative degrees of linguistic ambiguity, nomenclature variability, etc. However, synthetic fractional quantification measures used in a semantic environment would need to remain differentiated from quantification measures for their associated referents; at least so as to avoid semiotic symbol with referent confusions.
There are many examples of system-type problems related to forming valid optimization strategies over domains having constituent data-sets of assorted characters. According to one such example, there would be benefits if one could validly combine consumers' perceptions of fruit and vegetable quality with the agronomists' data capture universe; wherein is recorded precise measures of genetic makeup, growing conditions, biochemical variations, etc. According to another example, there would be benefits if one could validly combine demographic and actuarial databases with personal medical records and medical research data. Today, validly forming such strategies is a haphazard undertaking, of often-questionable objective value. More generally stated, there would be benefit if one could validly posit optimization strategies over domains having constituent data-sets of assorted characters; differing in complexity, resolution, and quantification.
Database management and knowledge-engineering represent a class of computer-implemented strategies for addressing such problems. Database management relates to organizational tools for establishing and maintaining data-sets of assorted character. For example, Boyce-Codd normal forms address tradeoff issues of efficiency and redundancy in very large purpose-specific data banks. However, database management does not address how to best benefit from knowledge that is held in these data banks.
Accordingly, there has arisen a discipline, currently called knowledge-engineering that attempts to generalize knowledge characterization strategies over heterogeneous domains having constituent data-sets of assorted characters; differing in complexity, resolution, and quantification. To date, knowledge-engineering's most significant contribution has been the semantic search engine, which has subtle embodiment variations called search robots, search agents, data mining tools, etc. While search engines have proved to be very versatile tools for data-sets dominated by semantic content, they have not yet evolved into methodologies that provide meaningful linkages with data-sets having quantified characters. Thus, the general need in the art remains to validly posit optimization strategies over domains having constituent data-sets of assorted characters; differing in complexity, resolution, and quantification.
A number of other classes of computer-implemented strategies are currently fashionable for addressing such problems. Examples of such strategies include modeling, expert systems, statistical process control, and neural networks. While each of these strategies has contributed some modest advance over its respective prior art, it is generally appreciated that these strategies are insufficiently modular to allow facile integration of new conceptualizations of ideas, which are brought into consideration by their implementation. Furthermore, the validity of the design process, which facilitates a computer implementation of any of these strategies, is often dependent on the level of genius of design team. Clearly, this is an inherent weakness, the alleviation of which would be of benefit in countless technological and econometric disciplines, especially if the method of alleviation is conceptually facile and straightforward for computerized implementation.
More specifically, a critical discussion of modeling, expert systems, statistical process control, and neural networks is forthcoming.
Modeling may be generally described as a low complexity topological graph describing node relations wherein each node corresponds to a data structure of empirical data. These nodes are homogeneously relating to a lower resolution and homogeneously relating to like quantification, while the associated data structures are disparately relating to higher resolution and to homogeneously like quantification within each data structure but not necessarily between data structures. The model is then used to simulate how the modeled system might react to a hypothetical perturbation of some of the empirical data.
Typically, modeling is applied in situations where there are many variables having complex interactions, especially where some of these interactions must be described using non-linear equations or using random variation functional components. Modeling is also applied in situations where visualizations, of the variables and their interactions, are believed to contribute to understanding aspects of the system being modeled.
Conceptually, the simplest models posit a pair-wise functional relationship between variables, such that each variable is a node of the topological graph and the pair-wise relationship describes the low complexity. The higher resolution data-sets then are used to describe an empirical manifold in the multi-dimensional space, as described by the pair-wise functionally orthogonal variables. Ordinary algebra, calculus, or statistics is then applied to simulate hypothetical empirical situations.
Conceptually, a more complex class of models posits multivariate functional relationship between assorted combinatorial groupings (n-tuples) of variables, wherein the aggregate of relationships join all of the variables into a single topological graph. Somewhat like the simpler models, higher resolution data-sets then are used to describe an empirical manifold for each relationship between the assorted combinatorial groupings of variables. Integrating a relational rule set with ordinary algebra, calculus, or statistics then allows hypothetical empirical situations to be simulated.
Conceptually, a most complex class of models posits embedding of either or both of the above described models within nodes of the more complex class of models. The designing and integrating of relational rules then becomes a cumbersome task that depends on the level of genius of design team, especially for computer implementations. Likewise, the classes of hypothetical empirical situations to be simulated are generally limited by the structure of the design.
In order to escape from this type of limitation, a tedious class of modeling tools called expert systems has been developed. Conceptually, expert systems shift the focus of the simulation from the empirical data manifolds to the designing and integrating of relational rules. Since it is presumed that the experts have subsumed the empirical manifolds, simulating hypothetical empirical situations at the manifold level is replaced by simulating a higher complexity topological graph describing node relations. Expert systems then become a most complex class of models that are critically limited by the structure of their design. Methodologically, the only way to improve an expert system is by implementing a longitudinal study of interviewing experts and integrating their changes of mind and mood.
Another class of modeling tools, called process control models, has been developed. Here, the complexity of functional relationships between variables is grouped as a single node for each station in a process, and the topological graph of node relationships is according to the complexity of the process being modeled. Furthermore, each station in the process is internally amenable to any of the above modeling methodologies including expert systems, albeit as constrained by the inputs and outputs for each station. Independently, the overall process is likewise amenable to benefit from using any of the above modeling methodologies including expert systems, albeit as constrained by the topology of the process. Simply stated, process control focuses simulation and decision resources on a limited class of optimization hypotheses that are constrained by the topology of the process.
Process control models are chosen in circumstances where the overall process is pragmatically optimized by locally optimizing the process at each station. Furthermore, for most applications, process control focuses simulation and decision resources on a limited class of optimization hypotheses that are constrained by using the simplest modeling techniques for each station. For this reason, statistical process control tools, neural network tools, and similar tools have become popular, in that they can be facilely applied to any station, as if that station were isolated from factors at other stations.
In statistical process control (hereinafter SPC), gross statistically derived threshold-type limits are assigned individually for metrics associated with inputs or outputs at a station; wherein each of these metrics was considered in isolation, in conceptually similar ways to that used in the simplest class of modeling and simulation.
For example, an SPC station may assemble two primitive components C1 and C2 together to form an aggregated component C3. Each of these components has statistically defined acceptable tolerance limits for at least one measurable aspect of the component; C1 (min, max), C2 (min, max), and C3 (min, max). The presumption is that if all C1 components are in the range C1 (min, max) and if all C2 components are in the range C2 (min, max), then all C3 components will be in the range C3 (min, max). Simply stated, using SPC tells us to set off an alarm and call a control process engineer whenever C3 components are measured to be out of the range C3 (min, max); and this actually happens even if C1 and C2 components were within their acceptable tolerance limits.
When out of specification C3 components are produced, the process control engineer first decides either to stop the process or to let the process continue. Typically, the process is stopped when the result is potentially catastrophic, such as in nuclear power plant SPC or in chemical synthesis of essential therapeutic drugs. Otherwise, the process control engineer may elect to let the process continue, even though the resultant out of specification C3 components may be worth much less than in specification C3 components.
Regardless of the process control engineer's decision, there is a need in the art for a method of improving SPC. More specifically, there is a need in the art for automatic tools to aid the process control engineer in returning the process to producing C3-type components within acceptable tolerance limits.
One aspect of this standard SPC problem is that there is an accumulation of contingent degradation of tolerances, in a concatenation of specifications for a plurality of interdependent stations. Simply stated, when there is a plurality of independently defined specification limits, these specifications actually convolute at a higher resolution into a configuration where not every combination of input specification parameters yields an acceptable final station output result. Thus, there is a need in the art for a tool that allows SPC specifications to be convoluted at a higher combinatorial resolution.
Another way to appreciate this need is to consider SPC as a model of a multivariate functional relationship wherein an upper bound threshold manifold and a lower bound threshold manifold represent the solution limits for a predetermined volumetric region in an orthogonal solution space. Clearly, only in unusual circumstances, such as when the manifolds are parallel and also slice through the predetermined volumetric region in an absolutely orthogonal fashion, will the convolution of the SPC limits be equivalent for both low-resolution and high-resolution specifications. However, if the manifolds are parallel and also slice through the predetermined volumetric region in an absolutely orthogonal fashion, then virtually none of the variables in the domain of the multivariate functional relationships affect the results.
In neural networks, high-resolution empirical data is accumulated and correlated with low-resolution decision data, substantially in order to define limits like those that were defined in the SPC method. Neural networks are used in situations where setting specification threshold limits for inputs is excessively complex, often because input variables being measured are highly interdependent, and simultaneously where setting threshold limits for outputs is well understood or at least easy to define. Here too, there is a need in the art for a tool that contributes to defining acceptable tolerances for aspects of inputs to a neural network evaluated process, so as to beneficially improve metrics of productive throughput for that process.
Another way to appreciate this need is to consider a neural network as a model of a multivariate functional relationship wherein a very complex topological shape constitutes the solution limits for a predetermined volumetric region in an orthogonal solution space. While this may be correct, no additional understanding or progress may be derived from this solution. Therefore, when neural networks are used, improvements and innovations of the process are conceptually inhibited.
In accordance with all of the aforesaid general background, there is a need in the art for a knowledge-engineering protocol-suite:                to provide a unified frame of reference for the numerous aspects of knowledge-engineering;        whereby new knowledge-engineering apparatus and appurtenances may be independently designed to integrate facilely with each other; and        that substantially provides a framework through which existing knowledge-engineering products may be compared, functionally de-convoluted, and seamlessly integrated to form large-scale knowledge-engineering systems.        
Most professionals, working in knowledge-engineering, are familiar with the Open Systems Interconnect (OSI) reference model of the International Standards Organization (ISO). This well-known OSI model is a common point of reference for categorizing and describing network devices, protocols, and issues. Countless network devices are designed to operate at certain OSI protocol levels. Likewise, in today's ensemble of network protocols, virtually each of the known protocols can be mapped onto the OSI reference model. Accordingly, it would be of tremendous benefit if a knowledge-engineering protocol-suite could be provided that builds on this familiarity with the OSI model.
The (OSI) reference model offers a seven-layer model structure defining the “ideal” network communication architecture. This model allows communication software to be broken into modules. Each layer provides services needed by the next layer in a way that frees the upper layer from concern about how these services are provided. This simplifies the design of each layer.
With the emergence of open systems, the OSI model set rules that would allow different manufacturers to build products that would seamlessly interact. One of the key areas of importance is the interoperability of network technologies. As a result, this model was designed for the development of network protocols. Although no protocol has yet been developed using this model, it has come to be accepted as a standard way of describing and categorizing existing protocols.
OSI conceptually puts names to the different tasks that a computer network has to fulfill. The ISO model defines seven layers, providing a logical grouping of the network functions. This model is good for teaching, and for planning the implementation of a computer network. Furthermore, dividing functionality in defined layers has the advantage that different parts of the network can be provided from different vendors and still work together.
When describing the different layers, one starts from the bottom and proceeds up through the upper layers. This is because some of the functionality and problems of the higher layers result from properties of the lower layers. The network stack used in the Internet illustrates the fact that a network is (usually), not implemented exactly as described in the OSI model. One protocol stack in use is referred to as the TCP/IP (Transfer Control Protocol/Internet Protocol) stack.
In order to appreciate today's network architectures and devices, it is important to understand the seven layers of the OSI model and their respective functions. The OSI reference model protocol layers, each with a unique function, are as follows:
OSI Physical Layer (layer 1) is where the cable, connector, and signaling specifications are defined. This layer provides mechanical, electrical, functional, and procedural means to activate and deactivate physical transmission connections between data-links. This layer is concerned with the encoding and decoding of digital bits (1s and 0s) between network interfaces. It is typically a function of the interface card, rather than a software utility.
OSI Data-link Layer (layer 2) deals with getting data packets on and off the wire, error detection and correction, and retransmission. This layer is generally broken into two sub-layers: The LLC (Logical Link Control) on the upper half, which does the error checking; and the MAC (Medium Access Control) on the lower half, which deals with getting the data on and off the wire. This layer provides functional and procedural means for connectionless-mode transmission among networks. The data link layer is concerned with the transmission of packets from one network interface card to another, based on the physical address of the interface cards. Typical data link protocols are Token Ring and Ethernet. The device driver that comes with the network interface card typically enables these protocols. The device driver will be loaded in a specific order with the other protocol programs. The data link layer is a point-to-point protocol, much like an airline flight. If you have a direct flight, one plane can get you to your final destination. However, if you have a connecting flight, the plane gets you to your connection point, and another will get you from there to your destination, but its up to you to make the connection yourself. Bridges operate at this layer.
OSI Network Layer (layer 3) makes certain that a packet sent from one device to another actually gets there in a reasonable period of time. Routing and flow controls are performed here. This is the lowest layer of the OSI model that can remain ignorant of the physical network. This layer provides a means of connectionless-mode transmission among transport entities. It makes transport entities independent of routing and relay considerations associated with connectionless-mode transmission. The network layer is concerned with the end-to-end delivery of messages. It operates on the basis of network addresses that are global in nature. Using the airline example, the network layer makes sure that all the connecting flights are made, so that you will actually arrive in your final destination. Network layer protocols include the IPX portion of the Netware IPX/SPX protocol and the IP portion of the TCP/IP protocol stack. Routers operate at this level.
OSI Transport Layer (layer 4) makes sure the lower three layers are doing their job correctly, and provides a transparent, logical data stream between the end user and the network service being used. This is the lower layer that provides local user services. This layer provides transparent data transfer between sessions and relieves them of concern about achieving reliable and cost effective data transfer. SUPER-UX supports Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). The transport layer is concerned with issues such as the safe, intact arrival of messages. It makes the receiver aware that it is going to receive a message, insures that it does get it, and can control the flow of the message if the receiver is getting it too fast, or re-transmit portions that arrive garbled. In our airline analogy, suppose you are flying your children to Grandma's house unaccompanied. The data link layer planes will make their flights. A small fee will insure that network layer ground attendants get your kids from one flight to their connection. The transport layer will call Grandma to let her know they are coming and what their luggage looks like, and will expect a call from Grandma when she has them safe and sound. Typical transport layer protocols are the SPX portion of Netware SPX/IPX and the TCP portion of TCP/IP.
OSI Session Layer (layer 5) is where communications between applications across a network are controlled. Testing for out-of-sequence packets and handling two-way communication are handled here. This layer provides the services needed by protocols in the presentation layer to organize and synchronize their dialogue and manage data exchange. The session layer is the layer that manages all the activities of the layers below it. It does this by establishing what is called a virtual connection. Essentially a virtual connection is established when a transmitting station exchanges messages with the receiving station, and tells it to set up and maintain a communications link. This is similar to what happens when you log into the network. Once you have logged in, a connection is maintained throughout the course of your user session until you log out, even though you may not be accessing the network continuously.
OSI Presentation Layer (layer 6) is where differences in data representation are dealt with. For example, UNIX-style line endings (CR only) might be converted to MS-DOS style (CRLF), or EBCIDIC to ASCII character sets. This layer manages the representation of the information that application layer protocols either communicate or reference during communication. The presentation layer's function is to establish a common data format between communicating nodes. It is responsible for formatting the data in a way the receiving node can understand. It may also perform data translation between different data formats. Examples of data format differences include byte ordering (should it be read from left to right, or vice versa) and character set (ASCII characters or IBMs EBCDIC character set) as well as differences in numeric representation.
OSI Application Layer (layer 7) is where the user applications software lies. Such issues as file access and transfer, virtual terminal emulation, inter-process communication, and the like are handled here. This layer serves as the window between corresponding application processes that are exchanging information. The application layer provides the user-accessible services of the network. These services include such things as network file transfer and management, remote job initiation and control, virtual terminal sessions with attached hosts, electronic mail services, and network directory services.
This seven-layer OSI reference model has proved to be a great conceptual catalyst for today's rapid developments of network infrastructure apparatus and associated software systems. Recalling the definitions presented at the beginning of this general background section, specifically for “complexity”, “resolution” and “quantification”, there is a need in the art for models that can accommodate modeling domains that differ greatly with respect to “complexity,” “resolution,” and “quantification”. More specifically, it would be of tremendous benefit if a single knowledge-engineering protocol-suite could not only be built on the existing familiarity with the OSI model but also be facilely applied to disparate applications; such as those that differ greatly with respect to “complexity,” “resolution,” and “quantification”.
The following technical articles and citations, patents, Internet accessible web-pages, and the like are thought to be useful for understanding the history of the art, the current state of the art, and the present needs and failings of the art. While it is presumed that the man of the art is already familiar with the substance conveyed by these items, others may find, in these items, concepts and descriptions that will advantageously supplement their appreciation of the present invention. Therefore, the citations given in this section do not constitute a disclosure for the man of the art, nor should they be considered as uniquely disclosing salient aspects of the prior art.
Expert Systems: Expert Systems—Design and Development, John Durkin; Prentice Hall International Inc. 1994, ISBN 0-13-348640-0, pp. 4–25.
Process Control: “Yield Analysis Software Solutions”—Pieter Burggraaf; Semiconductor International January 1996, pp. 79–85.
Statistical Process Control: Quality Control Handbook—Fourth Edition—J. M. Juran (Editor) McGraw-Hill Inc., 1988, 24.1–22 & 26.39–46.
Neural Networks: “An Introduction to computing with Neural Networks”—Richard P. Lippmann; IEEE ASSP Magazine April 1987, pp. 4–22.