1. Field of the Invention
The present invention relates generally to modeling and simulating the operation and performance of communications networks and distributed processing systems, and more particularly to methods for modeling the data traffic generated by the nodes of such networks and systems which improve the accuracy of the simulations.
2. Description of Related Art
The merging of computers and communications during the past two decades has led to explosive growth in computer communication networks and distributed processing systems. While the size and complexity of today's communication networks vary broadly, from networks of multiple processors on circuit boards or chips to global networks with thousands of users, the basic function of all communication networks is the same: namely, the transmission of information through a shared facility composed of switches, multiplexers, concentrators, processors and communication links. The sharing of resources among network users is governed by protocols, as well as management and control strategies.
The central issues in the design, implementation, and operation of communication networks are performance evaluation and trade-off analysis. The performance measures used to evaluate communication networks vary, depending on the type of network being analyzed and the application (e.g., military or commercial, voice or data, etc.). For military communication systems, the major performance requirements include network vulnerability and survivability in a stressed environment. For commercial communication networks, the most common performance measures for circuit-switched networks include probability of blocking, grade of service, throughput, and network efficiency. Throughput, delay, and resource utilization are the most important performance measures for store-and-forward-type commercial networks. Similar performance concerns govern the design of distributed processing systems.
There are a number of different types of network protocols and configurations for interconnecting computers with which those skilled in the computer networking art will be familiar. As an example of a computer network, one of the most popular distributed system architectures has at its core a hierarchical communication network structure like that shown in FIG. 1. This hierarchical network structure, called a Campus Area Network (CAN), consists of a high-speed (100 Mbps or greater) backbone, for example, Fiber Distributed Data Interface (FDDI), which connects to a number of lower-speed workgroup networks. Each of these workgroup networks, in turn, can be formed by hierarchically interconnecting a number of Local Area Networks (LANs), for example, 10 Mbps Ethernet or IEEE 802.3. Each workgroup network is used to interconnect the computers, workstations, and so on for a given workgroup. The backbone, then, allows data sharing and communication between the various workgroups.
This example of a CAN is configured to serve a very large and diverse community of engineering and administrative users. The network could eventually consist of a number of hierarchically connected LANs, each with a tree-shaped topology, as is illustrated in FIG. 3. These LANs can be, in turn, interconnected by an FDDI LAN. As can be seen from FIGS. 2 and 3, each station on the FDDI ring is an IEEE 802.3 bridge which is connected to a either a Multiport Transceiver (MPT), a 10-Base-T hub or an ethernet backbone. Any of these ethernet subnets supports up to ten connections to devices such as engineering work stations (ESW, W/S), personal computers (PC), and fileservers (FS, F/S), and so on--several hundred devices growing eventually to a few thousand.
There are also a number of protocols for interconnecting lower level processing functions of which those skilled in the art will be familiar, including the IEEE Futurebus+P896.1. An example of a multiprocessing system based on the Futurebus+protocol is illustrated in FIG. 4. It can be seen from the figure that such a system can in turn form the node for a larger communications network through a connection to a Local Area Network (LAN).
It is not difficult for those skilled in the art of computer networking to appreciate the magnitude of the problem of configuring an actual working system of the types described in FIGS. 1 and 4. The cost of complex communication network hardware and software, and of configuring an operational network, is not insignificant. In the past, unproven designs have frequently lead to unnecessary procurement and to the disruption of day-today network operations.
Current methods for evaluating the performance of networks include the use of direct measurement techniques, analysis through the use of mathematical models and actual simulation of the network. Direct measurement through the use of monitoring equipment attached to the network is time consuming and indicates performance for the existing network only. Predicting the impact on performance of the network when adding nodes or applications is unreliable in most instances. These tools can be used, however, for providing statistical data to construct simulation models.
Analysis of a network based on mathematical models can require many restrictive assumptions to develop manageable models; it is generally impossible to develop closed-form solutions except in idealized and often oversimplified cases.
Simulation of the operation of a particular network, however, can more easily incorporate empirical models and measured data and can be used to more accurately predict actual network performance, analyze trade-offs, and evaluate alternatives.
Network simulation can play an important role in the design, analysis and management of communication networks. Simulation results can be used to evaluate and optimize network performance. Typical performance measures computed via simulations are system throughput, average response time, resource utilization, blocking and loss probabilities, recovery time after failures and other parameters associated with the transient response of networks. These results can be used to evaluate network design alternatives at component and architectural levels, predict performance before purchasing and trying components, and generate configuration and performance specifications for procurement.
Simulation can also play an important role in network management. A network manager needs tools to predict changes in network performance due to changes in network traffic, topology, hardware and software. The manager needs a tool to provide answers to "what if" scenarios, assist in planning for capacity changes, load balancing, building network survivability scenarios, and anomaly investigation. Rather than experimenting with the actual network, the manager can use simulation tools to evaluate alternate strategies for network management.
To simulate the operation of a network architecture such as the CAN example shown pictorially in FIG. 2 and schematically in FIG. 3, a model must be specified which represents this network in its entirety. Some level of abstraction and aggregation of constituents is typically incorporated where possible to reduce complexity of the model. The simulation then actually exercises the model through the imposition of discreet events to which the model reacts.
The simulation technique typically used to evaluate network performance is discrete-event simulation. In discrete-event simulation, various components of the actual network under study (communication links, buffers, access strategies, network control structures) are represented by models in a computer program. The events that would occur during the actual operation of the network (arrival, transmission, routing and departure of messages; error conditions such as message loss or corruption; link/node failures and recovery) are mimicked during the execution of the program to actually exercise the models.
The functions of the simulation program are to generate events to which the models respond. The response of the models simulate the network's response. The simulation program also gathers data during the simulation and computes performance measures. Clearly, the more accurately the models mirror the constituents of the network, the more accurately they will model the response of the actual network to the above-listed events. It should also be clear, however, that the accuracy of the simulation also depends upon how accurately the events of the real network are mimicked in exercising the models.
The fundamental approach to modeling distributed processing systems and communication networks is essentially the same. A network engineer is concerned with the flow of "packets" of information, and network performance is usually measured in terms of delay, throughput, and utilization of communication resources such as trunks. The designer of a distributed processing system, on the other hand, is concerned with the flow of "jobs" or "transactions" in the system, and the performance of the system is measured by throughput, average time for completion, and utilization of computer resources such as CPU's, disks, and printers. In both cases, the designer is clearly dealing with a discrete-event system, and the focus is on allocating resources (trunks, CPU's, etc.) to multiple entities (users, processes, jobs, or transactions) in such a way that performance objectives are met within the resource constraints. Thus, although the description of the present invention will be provided in its application to higher level communications networks, those skilled in the art will readily recognize the equal applicability of the present invention to distributed processing systems
Network simulation tools, such as BONeS.RTM. (block Oriented Network Simulator) sold by Comdisco Systems, Inc., are commercially available for use in network design, analysis, and management in order to minimize the risk of deploying such a network. The BONeS.RTM. product and the manuals available to the public are herein incorporated by reference.
These simulators provide standard and user-defined models of the behavior of the various constituents of a network such as the one illustrated in FIGS. 1-3, including models of data traffic which are produced at each node of a given network and which provide the discrete events that drive the simulation.
The BONeS product also models the physical form (i.e. data structure) in which the data travels through the network. The data in a network typically travels from one node to another in the form of packets. Data packets include the binary data comprising the actual message being sent, but typically include additional overhead data added to the packet by the network to facilitate message routing, addressing and error detection. The exact structure of a packet will vary according to the particular network protocol being employed.
Network protocols vary according to particular suppliers, but most are structured as a layered set of protocol functions, with well-defined interfaces between layers. Each layer performs a distinct function and the implementation details for any layer are transparent to layers immediately above and below, and each layer communicates across the network only with its corresponding peer layer at other nodes.
The Open System Interconnection (OSI) model was developed in 1984 by the International Standards Organization (ISO) to provide a framework for the standardization of network protocols (Document Reference ISO 7498-1984). A description of this-seven layer model is shown in FIG. 5. Each packet originates in the Application layer (Layer 7) and the original message is further encapsulated with additional data to accomplish the function of the next layer.
The BONeS.RTM. Designer.TM. model of a network is specified by a hierarchical block diagram, which represents the flow of packets, messages, jobs, or transactions. FIG. 6a shows a simple BONeS.RTM. Designer.TM. block diagram of a two-node communication system, consisting of four blocks: two Traffic Sources, a Transmission System, and a Destination Node. This is a simple example intended to illustrate the use of such a simulation product. The Voice Traffic Source is a hierarchical block, composed of several component blocks as shown in FIG. 6b. The blocks in FIG. 6b generate packets with a uniform arrival rate and then set the Time Created field of the packet to the current simulation time, the Length field to the value of the Voice Packet Length parameter, and the Priority field to a value of 1. The packets which exit the voice source represent the packets which would be generated by a synchronous source such as digitized voice. The Data Traffic Source, shown in FIG. 6c generates traffic with Poisson arrivals and exponentially distributed lengths, representing a statistical model for data traffic produced by a particular node on the network. Outputs from the two sources are merged and passed on by the Transmission System.
The data structure representing the packet can be tailored for each type of data traffic produced (i.e. priority, length of packet, etc.). The data packets can be stamped with the time they are created by the source and with the time received by the packet destination. By comparing time stamps, delays can be computed. By adding the packet lengths of all packets transmitted by the system, throughput can be calculated. The values for the time the packet was created, priority and packet length are inserted into the data structure by the middle blocks in FIG. 6b, and the other time stamps (e.g. arrival time) are inserted as the packet passes through various points in the system.
It should be pointed out that the data structures of packets flowing through an actual network do not have fields for time stamps, and have actual lengths and therefore do not require a field to represent message length. The time stamp fields of the data structure are used for purposes of creating a record of the flow of the packet through the system during the simulation. The network protocol is modeled through use of the message length, which accounts for any encapsulation of packets as they move through the system.
The middle block in FIG. 6a represents the Transmission System. It queues up the packets coming out of the source, allocates trunks for transmission on a priority basis, and releases packets to the destination node after a time delay, which represents transmission time. The Transmission System block is hierarchical, built with BONeS.RTM. Designer.TM. library models for queues and resource allocation.
The total transmission delay is computed within the Transmission System block as the sum of the propagation time and the transmission time, which is equal to packet length in bits divided by trunk capacity in bits per second. Note that it is not necessary to actually generate the individual bits that are transmitted over the link. It is necessary only to compute the time required for transmitting a packet, which is a function of only the packet length and not the content.
The Destination Node time-stamps the received packets and puts them into a sink. By averaging the difference between the time at which the packet was received and the time at which the packet was created, the average overall delay can be computed. By repeating these simulations for different traffic loads, a typical delay vs. load performance plot for the network can be produced.
Additional traffic sources, representing additional nodes, can be added by simply adding additional traffic source blocks to the system block diagram shown in FIG. 6a. Alternatively, traffic characteristics can be modified by changing parameter values or by creating completely new traffic source blocks.
Further specific information regarding the BONeS.RTM. network simulator may be found in the BONeS.RTM. Designer.TM. Introductory Overview, the BONeS.RTM. Designer.TM. User's Guide, and the BONeS.RTM. Designer.TM. Modeling Guide. All of the publicly available products, manuals and supporting documentation are hereby incorporated herein by reference and have been published by Comdisco Systems, Inc.
Traffic Models used in simulators, such as the two models employed for the Voice Traffic Source and Data Traffic Source in the above example, are typically based on some statistical model for the message size and the time between generated packets. In the example above, the Voice Traffic Source model specifies a uniform arrival rate and a fixed packet length. The Data Traffic Source model specifies traffic with Poisson arrivals and packet lengths which are exponentially distributed. The problem with these models is that they are at best gross approximations of the kind of data packets which will be generated by a particular node at any particular time under particular network load conditions.
Each network node can be comprised of a different computer system configuration, having its own resources and running its own designated applications programs. For example, one node can be a personal computer (PC) which has a 20 Megabyte hard drive, an 80386 3CPU, disk drive and various other resources. This node may be dedicated to running various applications such as spreadsheets, wordprocessing, document administration, etc. Another node may be comprised of an engineering workstation, capable of running powerful circuit simulation applications which may involve traffic intensive video or graphics display functions. Still other nodes may be dedicated as file servers and could be large IBM or DEC minicomputers with large disc storage, or they could just comprise a personal computer.
The data traffic generated by each of these nodes will be radically different and will be a function of the application programs being executed by the node, the quality and quantity of the resources available at the node, the current demand on the node resources and the response times of other nodes from which they may request data. As previously discussed, the accuracy of any simulation will critically depend not only on the accuracy of its models for network constituents, but also on the models for the generation of the discrete events which exercise those network models.
In order to render more accurate models of the traffic generated by the nodes of a particular physical network, traffic analyzers have been employed to record actual data traffic on the network. Long samples can be used to create statistical models for the traffic characteristics of the particular network, and these models are then used in simulators to drive network models. Short samples have also been used to create traces which are used to recreate the exact pattern of traffic as recorded from the physical network. These traces are read by traffic generators to generate a repetitive output, used to drive the network models in simulation, which mimics the recorded traffic pattern.
There are still many accuracy problems associated with these approaches. First, a snapshot of data traffic on a network at any particular time will vary widely from a snapshot taken at any other time. Thus, generating a small, repetitive segment of measured traffic remains extremely inaccurate. Further, although statistical data gathering over a long period of time may be relatively more accurate, the peaks and valleys of activity on the network will be lost and thus those combinations of activities which may cause problems temporarily may not be ascertained.
Most importantly, the traffic models developed in the above-described manner are only for a particular physical network configuration as sampled. The characteristics of the traffic generated by each node on a network will change as the load on the network increases or decreases, as resources are altered at a particular node, or new applications are installed at existing nodes. Thus, if the reason for running a simulation is to determine what would happen to network performance if the number of nodes is increased from the existing ten nodes to twenty, the emulation of traffic using samples from the actual network with ten nodes no longer is valid.
Although one could attempt to extrapolate the relationship between each additional node and the resulting impact on performance, this would be inaccurate and is exactly what one is attempting to determine accurately through simulation in the first place. Thus the most attractive and beneficial reason to use simulation products, the desire to evaluate and predict performance of the network as load, resources or applications are varied without the need to actually configure the system, is defeated by the inability of current traffic models based on an existing network to alter their characteristics as a function of these variables.
The present invention provides a method for modeling traffic which is based on sampled traffic for an existing network but is not fixed as the number of nodes, nodal processing resources and applications run by each node are altered. Rather, the method of the present invention provides traffic models which are based on sampled traffic for a real system but which alter their characteristics as a function of the above-described variables and thereby render more accurate predictions of network performance.
The method of the present invention recognizes that networks and distributed processing systems operate in a mode which typically involves a request message generated by, and transmitted over the network from, a node connected to the network to another node residing somewhere else on the network. The receiving node then processes the request and issues a response, transmitted over the network and back to the requesting node. Quite often, the requesting node's next request is dependent on receiving the response and may also possibly depend on the nature of the response. Of course, some messages may not be in the form of a request or a response to a request, but they may initiate action by other nodes once received or they may be completely uncorrelated to any other messages.
Most networks today comprise a series of client nodes and a relatively smaller number of server nodes. The client nodes are those which run applications programs locally and must access large data bases for the data on which to operate. The server nodes are designed to maintain the data bases to which the clients desire access and provide the clients with copies of portions of these data bases in response to their requests.
The time necessary for a client node to formulate a request will depend upon a number of factors, such as the type of application(s) being run, the contention for the local node's processing resources, the speed of those resources, the time to receive the response to a previous request if the current request is correlated to the response, and even the time for an operator to think and type.
Likewise, the time for a server node to formulate a response once a request has been received will depend on the demand and contention for its resources (which is often directly a function of the number of client nodes it must serve), the types of applications that the client nodes are running (which will govern the size, complexity and frequency of required responses), and the speed and power of its available resources.
Of course, the size of the packets which comprise the requests and responses will also impact request and response times because the network topology and protocols will impose limitations on the network with respect to transmission time between nodes. This aspect of the system will be reflected in the models of the network topology (i.e. the number of nodes and links), the network data structures, the network resources (i.e. buffers, trunks, buses, CPU's and switches) and the network protocol functions (i.e. which govern the flow of packets in the protocol layers and allocation of resources).
Thus, it is the object of the present invention to model the characteristics of traffic generated by each client node which vary as function of the applications run by the node, the resources available to the node and the contention for those resources, the response times which govern the initiation of its requests and the processing time required to issue such a request.
The method of the present invention accomplishes this objective by modeling the traffic generated by a client node in accordance with the Client/Server Paradigm illustrated in FIG. 7. This model is specified using the four basic parameters described in FIG. 7: Client Waiting Time, Client Request Size, Server Response Time and Server Response Size.
As illustrated in FIG. 7, the Client Waiting Time is the time between the arrival of the last server node response packet and the transmission of the next client request packet. This time represents operator thinking time, typing time, and the time necessary for the local processor at the client node to evaluate the server response (if the response and the next request are correlated) and to process the next client request for transmission. The local processing time will be a function of the speed of and contention for its resources. The Client Waiting Time will also be a function of the application(s) being run on the client node and the particular operation of that application involving the request.
The Client Request Size is the size of the request packet and will also be a function of the application and the particular operation being executed. The Server Response Time is the time between the arrival of the client request packet at its destination node, and the time the server node transmits a packet in response to the client request. The Server Response Time is a function of the resource contention, application and operational circumstances as previously discussed. The Server Response Size is the response packet size and is also a function of application and operation.
It will be obvious to one of ordinary skill in the art that the method of the present invention can be applied to modeling the transaction traffic of distributed processing systems as well. Distributed processing systems are similar in that usually a number of local processors with their own local memory run local programs and operate on data imported from a few large memory sources. The processors run their programs until they require data which they don't have locally and initiate a transaction out over a shared bus to request the data from the shared memory source. The shared memory then issues a response to the local processor's request. Access to the bus, the size and form of the transactions, as well as data consistency are governed by the bus protocol and impact request and response time. The speed of requests and responses will also be impacted by the speed and contention for the associated processing resources, as well as the type of applications being run on the local processors.