The Internet has proliferated many new opportunities for companies selling products and services such as providing them the opportunity to expand their market presence all over the world. This presence has allowed many of these companies to not only increase revenue growth, but also to expand product lines and services offered to on-line consumers. Due to the increases in demand many of these companies have experienced, they typically devote a significant amount of resources to attract new and existing consumers to their on-line websites.
Nonetheless, in light of the successes many companies have experienced with their on-line offerings, few have the data that identifies which consumers are most apt to not only visit their website, but also purchase and repurchase products and services. This lack of data leaves companies feeling helpless with respect to effectively allocating resources to attract new and existing consumers to their on-line website. Accordingly, it is becoming increasingly common for companies that provide on-line services to capture and analyze on-line data to enhance the effectiveness of resources utilized to attract new and existing consumers to their on-line websites.
In more detail, on-line data may be derived from many sources such as web logs maintained by a web server or even data collected from a user's current interaction with a website. Many companies would find it advantageous to enable the consistent and timely capture and storage of such on-line data in a data warehouse. More particularly, the data could be analyzed by a company and used to make critical business decisions regarding its on-line business strategy based on consumer activity related to the website. Typically, the data that is collected in web logs has a relationship to the higher hierarchical organization of the website itself. The site context is ever changing and information about its previous state is almost never preserved. Because the site context is not persistent and the web log data is directly dependent upon that context, the data semantics are not persistent either. Many questions cannot be answered by web log data in the absence of site context.
Additionally, many companies might also find it advantageous to collect such data representing current user activity in real time or near time. Such real time data may allow a business entity to provide enhanced personalization of content to consumers and communication with its website. Accordingly, the present invention seeks to address the above issues and provide a system and method for capturing business context data from any variety of web sources in real time or in batch.