1. Field of the Invention
The present invention relates to processing asset information efficiently, and more particularly, to efficient processing of assets in an asset management system.
2. Description of the Related Art
Many asset management systems receive high-volume information feeds consisting of individual messages that report asset movements, change of state or confirmation of unchanged state, aggregations and disaggregations with other assets, and the like.
In a simple asset management system, each message or update is about a unique “atom” or object. Thus, the transaction processing a single message needs to lock only the single object to which the message refers. However, for a system managing a large number of assets with various attributes and states, a message may be about an asset, but the asset may be related (possibly by containment or a physical linkage) to other assets; those assets that are related to one another are known as an “aggregation” as will be more fully defined below.
To process information feeds with aggregated data, the asset management system needs to process a very large number of updates per second; to achieve this, the updates are processed in many concurrent threads. The updates frequently are processed on a “cluster” of separate computers and the data is held and processed, for example on a relational database.
Conventionally, such a system uses a message queue to process the high-volume information feeds. Standard message queues allow multiple sources of information to be received, aggregated, and distributed efficiently to many different nodes in a cluster, and to many threads in each node. A message queue distributes messages as fast as possible to any thread waiting to process a message.
However, when messages about the same object are received by the system at about the same time, two problems arise: concurrent updating and ordering.
The problem associated with concurrent updating arises when separate threads are processing information about the same entity. In these circumstances, it becomes critical that no two threads attempt to update the same data record at the same time. If this condition occurs, the data is likely to become corrupt, causing the system to behave incorrectly.
One conventional approach to resolving the concurrent update problem is locking the rows that each transaction needs to update. Locking may take the form of pessimistic or optimistic locking.
In pessimistic locking, a transaction locking a row locks out access to that row to all other transactions that want to read, modify, or lock that row. All other transactions wanting to access the row in any way must wait until the transaction that owns the lock releases it. This scheme can lead to a state known as “deadlock.” Consider an example in which a first transaction needs rows 1 & 2, and a second transaction needs rows 1 & 2. If the first transaction gets a lock on row 1 and the second transaction gets a lock on row 2, both transactions will wait indefinitely for the row each transaction is missing. Avoiding this sort of deadlock in a pessimistic locking scheme requires complicated, custom, and hard-to-debug code in a system. One known approach is to ensure that threads always take out locks on rows in the same order. However, doing so can be difficult if not impossible when complex, dynamic relationships exist between the objects that need to be locked. In addition, pessimistic locking reduces system throughput when many transactions are queued up waiting for a lock on the same row.
In optimistic locking, many transactions can have a lock on the same row at once, on the theory that potential problems are detected when updates are issued to the database. This process is implemented by logic to detect when a row with a lock on it has changed. If the system detects that a change has been made to a row, it will either fail the transaction or back out of the transaction and restart it. In a system in which conflict for rows is low, this process will maximize throughput. However, whenever there is conflict, all but one transaction is guaranteed to fail. Failure can be expensive, especially in terms of CPU and resource cost. Not only does each failed transaction have to roll back all the updates that it made up to backing out, which is typically several times the cost of making the update, but the transaction also has to be performed for a second time. Thus, high-volume information feeds cannot be supported well by either a pessimistic or optimistic locking scheme.
Another problem arises for messages processed out of order. Conventionally, threads are not deterministically scheduled by operating systems, and very rarely between different computers. As a result, two messages about a single asset that are received from the information feed at about the same time may be executed by different threads in different orders. For example, if the two messages are M1 at time t and M2 at time t+1, M1 may be processed first, followed by M2 or vice versa. Therefore, if M2 is processed first, the processing will not benefit from the knowledge in message M1. This misordering may cause the system to believe that there is a business exception. As a result of processing the messages out of order, the state of the system becomes invalid.
In addition, the ordering problem is exacerbated by the ability of the message queue to distribute messages extremely quickly to many different threads. For example, imagine that a group of twenty messages arrive into a queue, where the first and last (twentieth) concern the same asset. A conventional message queue would distribute each message to one of ten threads on each of two computers in a cluster, and it would do so nearly simultaneously. Given a small randomness in the behavior of threads, it is quite possible for the last (twentieth) message to start processing before the first message. As a result, the last message would own the lock on the database for the rows associated with the asset, guaranteeing that the first message will be either processed after the last message (using pessimistic locking), or fail (using optimistic locking).
In addition, traditional, simple systems make no attempt to recover from the errors or invalid states that are caused by such misordering. As a result, the quality and accuracy of the data in these systems are low.
Other conventional systems attempt to handle out-of-order messages by inferring missed messages. For example, when such a system receives a message, the system can infer that there is one or more missing, earlier message(s). The system could then “compensate” for the missing message(s) by filling in with a tentative, inferred message. Then, when a message arrives that matches an inferred message, the system would remove the inferred message, replacing it with the known message. Alternatively, if a message arrives that occurred between two messages that have been previously processed, the processed messages are modified and the new message inserted between them. However, this method is complex and costly in terms of processing, disk, and network overheads compared to the processing performed if the messages had arrived in the correct order.
Another conventional solution to this problem is to decrease the number of messages that a message queue will dispatch at any one time. However, this approach only reduces the likelihood of messages being processed out of order, unless the system as a whole processes only a single message at any one time. This approach is only suitable for systems with very low numbers of messages being received, thus, would not be effective for processing a very large number of updates per second.
A further complication of using an asset management system as described herein is the fact that an asset may be related (possibly by containment or a physical linkage) to other assets; those assets that are related to one other are known as an aggregation. Conceptually, therefore, several messages about different assets may in fact be messages about the same asset aggregation. Locking and ordering within the system must take these aggregations into account to prevent data integrity failures. This is in contrast to a simple message processing system, in which each message is about a unique atom or object.