In today's rapidly changing marketplace it is important for businesses of all sizes to disseminate information about the goods and services they have to offer. To accomplish this efficiently, and comparatively inexpensively, many business have set up sites on the World Wide Web. These sites provide information on the products or services the business provides, the size, structure, and location of the business; or any other type of information which the business may wish people to access.
Conversely, it is also important for businesses to collect information on the people who are interested in them. These people may include customers, investors or potential employees. One inexpensive method of obtaining data on these people and their various interests is to recreate a visitor's activity on the website of the business. After assimilating data on visitors to their website, the business will have a clearer picture of their interests, and to some degree the effectiveness of the various portions of the website.
The construction and implementation of many websites, however, makes this a difficult task. Though a website may appear as a seamless entity when viewed with a Internet web browser, in truth most websites are run by a variety of servers and computers. For example, one group of servers may be running applications providing information on support, some servers may be running CGI gateway applications, and others may be providing product data. This division means that a visitor to the website may be hosted by one server at the beginning of his visit, switched to another server while navigating the website, and wind up on a third before his visit is complete.
Thus, to recreate a visitor's activity on all websites during a single visit (session) all the data about that particular visitor's activity on every server which operates the website should be analyzed. Because there is such a large volume of data available on each user it is helpful to process the data feeds from these servers in real-time. This means that the availability of the data is of the utmost importance. If data is missing or otherwise incomplete the wrong calculations may take place. It is also costly to add missed data back to a set of data which has already been processed. Adding to the complications is the fact that data may not be reported from the various servers in a synchronous manner.
Therefore, in order to reconstruct a visitor's session it is critical that the system analyzing the data reported from the servers is aware of what data to expect, and what data is actually available. Furthermore, the system must be able to synchronize the data under scrutiny. Prior art systems for processing this session and utilization data were not necessarily aware of the type and availability of data, and would either process incomplete data or required data to be bundled and ready to be processed as a batch. Additionally, these prior art systems lacked awareness of the network topology from which they received data, which in turn hampered these systems ability to make intelligent decisions about missing data.
Thus, there is a need for systems and methods which may process data streams from a network topology, detect gaps in a data stream in order to prevent the processing of incomplete data, and which may store the incomplete data separately until it is complete and capable of being processed as a whole.