Since the end of the 80's the world-wide web (www), the web in short, has become the ubiquitous application of the world public network, i.e.: the Internet. Myriads of web servers are indeed set up and maintained by all sorts of institutional, academic, governmental and commercial organizations around the world that let millions of remote users have access to an overall huge amount of distributed information. While web servers were first mainly used to only deliver information they also now provide unlimited types of sophisticated interactive services. Among those interactive servers the online commercial sites that are devoted to the selling of goods and services are certainly the most critical to set up. Their prime objective is to be able to attract and retain visitors and eventually convert the largest possible fraction of them into actual customers. Visitors are, in general, all the individuals that have access to any of the web servers available on the Internet from any form of a plethora of computerized devices that can connect to a wired or wireless network. This includes devices such as handheld or fixed personal computers (PC), personal digital assistants (PDA) and cellular smart telephones all capable of running a web browser or a navigating software application so that their users can browse through online sites to consult information and, possibly, complete all sorts of transactions, commercial or not.
To meet the objective of attracting more visitors and converting browsers into buyers the designers of highly interactive web sites are faced to the difficult problem of having to constantly improve the usability of their sites. Helping visitors to quickly find relevant information on a website greatly improves customer retention and loyalty. This improvement process goes through the acquisition and gathering of knowledge on the visitors of websites so that some form of personalization can be carried out. Making results of information retrieval and search more aware of the context and user interests is key to achieve this task. Also, the owner of a website is generally willing to better know the profile of visitors so that corrective actions can be taken to widen the audience of the site. One concrete interest of having a good knowledge of user profiles is to increase the visibility of a website by improving its ability to be efficiently found and detected via search engines.
The techniques traditionally used to identify remote users consist essentially in the sending and retrieval of cookies. Cookies are small text messages generated and sent by a web server to a web browser after a page has been requested by a remote user of the server. The browser then stores cookies in a non-volatile memory space, i.e., generally, on the hard disk of the remote requesting device. Cookies are sent back to the originating server each time a new web page is requested from the originating server.
Also, the internet protocol (IP) address of the connecting device, a 4-byte identifier in the current level of the internet protocol (IPV4), can be retrieved and used to differentiate between remote users of a site.
There are however problems when using the two above techniques to identify the remote users of a web site. As far as the IP address is concerned a same individual that connects from diverse locations using different terminals is accounted as different users. Conversely, a same connecting point, e.g., a family internet connection, is possibly used by several people that will be however identified as a single individual while people profiles and interests are likely to be very different.
For the cookies, all the web browsers now provide means to manually flush them (clearing history) and to personalize the way they are automatically handled by the browser. Remote users are generally now well aware of the cookie mechanisms that can be used by commercial sites to provide unwilling advertisements and offers. To prevent this from happening, and generally to prevent their privacy to be jeopardized, many are periodically or systemically flushing cookies from their browsers actually rendering this technique somehow ineffective.
WO-A2-2007/001397 discloses a method and system for identifying users and detecting fraud by use of the internet. This publication is purely dedicated to the personal identification of users since it concerns fraudulent use issues. To achieve this goal, its collects some identification data as an aggregate of information which helps identify one user device without ambiguity. This reference is strictly directed towards the identification of customer computers.
It is an object of the invention to provide improved techniques to better identify profiles of remote users of websites.
To reach this goal, the invention advantageously combines specific hierarchical steps and characteristics about the navigation path of the user. This association enables the invention to save computing resources when a decision about an identification can be taken at an early stage and, at the same time, to perform a refined analysis of a navigation behavior each time it is helpful.
Further objects, features and advantages of the present invention will become apparent to the ones skilled in the art upon examination of the following description in reference to the accompanying drawings. It is intended that any additional advantages be incorporated herein.