The invention relates to information retrieval, and more particularly, to systems and methods for discovering frequent trees.
In various application domains, requirements for discovering frequently accessed subtrees from access data streams are increasing. Portals and online shopping websites are browsed by thousands of people every hour or even every few minutes. The data stream accessed in the form of trees, representing traversal coverage, is generated to record the browsing behavior of a user. Continuously discovering frequently accessed subtrees over accessed data streams facilitates decision making for website management. For example, the nodes of a frequent subtree indicate frequently accessed pages, which can be pre-fetched to reduce future page access time. In addition, frequently accessed subtrees indicate user interests about the website and can therefore be applied to sales promotions for online shopping. Furthermore, discovering frequently accessed subtrees also benefits man-machine interface (MMI) management for a mobile electronic device, such as a mobile phone, smart phone, MP3 player and similar. In an MMI, the nodes of a frequent subtree indicate the frequently accessed items and therefore the organization of items can be automatically adjusted in response to the discovered frequently accessed subtrees.