The present invention is directed generally to a method and apparatus for recognizing and predicting transactions and particularly to a method and apparatus for recognizing and predicting transactions using regular expressions from formal language theory.
In computer networks, xe2x80x9cinformation packetsxe2x80x9d are transmitted between network nodes, wherein an informational packet refers to, e.g., a service request packet from a client node to a server node, a responsive service results packet from the server node to the client node, or a service completion packet indicating termination of a series of related packets. Server nodes perform client-requested operations and forward the results to the requesting client nodes as one or more service results packet(s) containing the requested information followed by a service completion packet. A xe2x80x9cservice request instance,xe2x80x9d or merely xe2x80x9cservice requestxe2x80x9d refers to a collection of such informational packets (more particularly, service request packets) that are transmitted between two computational components to perform a specified activity or service. Additionally, a group of such service requests issued sequentially by one or more users that collectively result in the performance of a logical unit of work by one or more servers defines a xe2x80x9ctransaction occurrencexe2x80x9d. In particular, a transaction occurrence may be characterized as a collection of service requests wherein either each service request is satisfied, or none of the service requests are satisfied. Moreover, the term xe2x80x9ctransactionxe2x80x9d is herein used to describe a template or schema for a particular collection of related transaction occurrences.
It would be desirable to have a computational system to recognize occurrences of transactions and analyze the performance of the transaction occurrences. Accordingly, it is important that such a system be capable not only of recognizing the occurrences of a variety of transactions, but also of associating each such transaction occurrence with its corresponding transaction.
In practice, there are several common variations in the occurrences of a given transaction. These variations are: (a) a service request (or group of service requests) may be omitted from a transaction occurrence; (b) a service request (or group of service requests) may be repeated in a transaction occurrence; and (c) a transaction occurrence may include a service request (or group of service requests) selected from among several possible service requests (or groups of service requests). For example, a transaction occurrence that queries a network server node for retrieving all employees hired last year is likely to be very similar to a transaction occurrence that retrieves all employees that were hired two years ago and participate in the company""s retirement plan. These variations are often difficult to account for because, though the number of distinct transactions is typically small, the number of transaction occurrence variations can be virtually unlimited. Accordingly, it is often impractical to manually correlate each variation back to its corresponding transaction.
An objective of the present invention is to provide a software architecture that is able, based on a sequence of service requests, not only to recognize the occurrences of each of a variety of transactions but also to correlate the occurrences of variations of a given transaction with the transaction itself. A related objective is to provide an architecture that is able to identify occurrences of a transaction, wherein for each such occurrence, a service request (or group of service requests) that is part of the occurrence may have the following variations in a second occurrence of the transaction: (a) a service request (or a group of service requests) may be omitted from a sequence of service request for the second occurrence; (b) a service request (or a group of service requests) may be repeated one or more times in the sequence of service request for the second occurrence; and/or (c) a service request (or a group of service requests) for the second occurrence may be selected from among several possible service requests (or groups of service requests).
In one embodiment of the present invention, a computational system is provided for recognizing occurrences of a transaction, wherein each such occurrence is defined by a sequence of one or more service requests. The method performed in this computational system includes the steps of:
(a) reading a service request that is transmitted between computational components;
(b) combining a representation of the service request with a plurality of other service request representations to form a string of service requests representations; and
(c) comparing the string of service request representations with a formal language regular expression characterizing the transaction to determine if the string corresponds to the transaction.
This methodology not only expresses transactions in a simple and precise format but also, and more importantly, predicts additional transaction occurrences that have not yet been seen. Accordingly, once a transaction is characterized as a regular expression, the characterization can be used to recognize transaction occurrences having various service request sequences, without additional manual intervention.
As will be appreciated, a regular expression is a representation of a formal language in which operators describe the occurrence and/or nonoccurrence strings of symbols of the language. Common regular expression operators, for example, are as follows:
A formal language corresponding to a regular expression can be used to define a transaction as a language using service request representations as the symbols of the language. That is, service request representations become the xe2x80x9calphabetxe2x80x9d of such a regular language, and occurrences of the transaction become string expressions represented in this alphabet. By way of example, the transaction, T, defined by the regular expression A* B+ C? D [E F G] specifies that service request A can be present 0 or more times; service request B must be present 1 or more times; service request C may be absent or present only once; service request D must be present only once; and only one of service requests E, F, and G must be present. Only if all of these conditions are met, in the specified order, will an occurrence of transaction T be recognized.
The characterization of a transaction as a regular language can be done either manually, or automatically by a computer. For example, a suitable computational technique can be devised to recognize strings of service request representations denoting the same transaction by:
(a) collecting, over a particular time period, service request instance data transmitted to and from an identified process or computational session;
(b) normalizing the data for each service request instance so that known variations in the service request instances (e.g., different database query values for the same data record field) not pertinent to identifying transaction instances are removed or masked for thereby providing xe2x80x9cnormalized request instancesxe2x80x9d that are similar to templates of service request instances.
(c) partitioning the service request instance data into one or more subsets, wherein each subset is expected to be a representation of an instance of a transaction;
(d) determining a regular expression characterization for each partition based on an examination and generalization of repeated service request instance data collections, human understanding of the transactions being performed, the source of the service request instances, and/or the data fields within the service request instances.
Regarding the reading step, mentioned hereinabove, and performed by the computational system of the present invention, this step can include a substep of selecting a category or xe2x80x9cbinxe2x80x9d to which an individual service request (or group thereof) can be assigned. In particular, such a categorization of a service request many be determined based on at least one of source and a destination process of the service request. For example, in a client-server network, service requests generated by users at client nodes may be assigned to a number of bins, such that each bin includes only those service requests generated by a single user. In particular, each bin includes service requests identified by a collection of related processes, denoted a xe2x80x9cthreadxe2x80x9d in the art, wherein the related processes transmit service requests from, e.g., a single user to a particular server. That is, a xe2x80x9cthreadxe2x80x9d may be considered as a specific identifiable connection or session between a client node and a server or service provider node of a network. Moreover, a thread is preferably identified such that it accommodates only one service request on it at a given point in time. Typically, each thread may be identified by a combination of client (source) and server (destination) nodes. As will be appreciated, in some applications a single network node address (of the source and/or destination) is not an adequate identifier of a thread because there can be multiple sessions or processes executing on a given network node, thereby generating multiple threads. In such cases, connection or session identification information for communicating with a server node can be used in identifying the thread to which the service packet corresponds. Moreover, a thread can be either a client (user) thread, which is a thread that is identifiable using with a specific client computer or user identification, or a shared thread, which is a thread shared among multiple client computers (users).
Still referring to the reading step. to determine whether the read service request is part of a string of service requests corresponding to an occurrence of a transaction, the time interval between:
(a) the service request that is nearest in time to the read service request (e.g., the last service request in a sequence of service requests) and;
(b) the read service request is compared against a predetermined time interval. If the time interval is less than the predetermined time interval, the read service request is considered to be a part of a common occurrence of a transaction with the nearest service request. If the time interval is more than the predetermined time interval, the read service request is not considered to be a part of a common transaction occurrence with the nearest service request.
Because a service request may be represented as an extremely long text string and can therefore be inefficient to work with and clumsy to use in matching to a regular expression for a transaction, a unique identifier can be provided for identifying each service request. Note that such an identifier can be a symbol, such as an alphabetical or numerical symbol or sequence thereof.
Further note that the request identifier of a service request is different from the bin in which it is included in that the service request identifiers become the symbols or alphabet of the transaction regular expression according to the present invention.
Another embodiment of the present invention is directed to a system for identifying occurrences of transactions from sequences of service requests using regular expressions. The system includes the following components.
(a) a means for reading a service request that is transmitted between computational components (e.g., on a communications line between a client and a server node of a network, or between two servers);
(b) a means for combining a representation of a service request with a plurality of other service request representations to form a string of service request representations wherein the string may be representative of a transaction; and
(c) a means for comparing the string of service request representations with a regular expression characterizing a transaction to determine if the string corresponds to an occurrence of the transaction. As will be appreciated, the reading means, combining means, and comparing means are typically performed on the same processor, or in a number of interlinked processors.
Other features and benefits of the present invention will become evident from the accompanying detailed description and drawings.