The World Wide Web (WWW) is an enormous and ever growing source of information on the Internet, supplied and used by companies, organizations and private persons.
Most pages on the WWW are based on documents written in HTML (Hypertext Markup Language) or XML (Extensible Markup Language). HTML and XML are both subsets of SGML (Standard Generalized Markup Language). SGML is not a programming language, but rather a text processing standard describing the layout as well as the contents of the documents.
An increasing number of companies is in the business of collecting information from a large number of web sites and presenting this information (often formatted to a common layout) on a single web site.
In order to collect information from web sites constructed in many different ways, these companies have to design specific programs (“robots”) for each web site to decode the HTML (or XML) documents and extract the desired information (e.g. model, mileage and price for a number of used cars for sale). The implementation of these robots is both tedious and very time-consuming while also requiring skilled programmers, preferably using the Java programming language.
A further problem related to the existing generation of robots is that the data sources may typically comprise at least two data sources having time-varying data formats. A problem with the data sources having time varying formats is that automated interpretation processing becomes extremely complex and time-consuming in order to obtain the desired information. Even when applying artificial intelligence, the obtained results should be compared with the efforts made in order to obtain the desired result.
U.S. Pat. No. 5,999,940 discloses a web based market-place comprising a search facility for a more or less direct accessing to different data sources. Market places of the described kind may offer a search in a huge material through which, only one search profile established at the market place may address information stored in several different data sources. A problem of the disclosed invention is that the offered information is restricted to uniquely identified items, i.e. items which may be described and identified completely by a unique ID-number or at least has to follow one specific syntax known and accepted by both the data source owner and the programmer of the search robot. In other words, only items having a common ID-key may be offered at the market place due to the fact that the market place may only access information at other data sources if there is a common understanding of the representation needed for identifying the individual items.
The disclosed invention of U.S. Pat. No. 5,999,940 deals specifically with the requirements to the data source and the querying server, so as to define the information which may be accessed by the querying and which information to be hidden to the query.
In other words, dependencies exist between the market place provider and the data source owner, as a roboting of the data source requires that the data structure of the data source fits to the robot or agent roboting the individual data source. Obviously, such requirement restricts the groupings of data which may be accessed significantly as data source owners not necessarily are aware of such unique ID's, if such ID's are existing at all.
One of the objects of the invention is to provide a search strategy that improves the search possibilities for the customers and increases the availability of data in case of break-downs or other failures on some of the web sites providing the data.
Another object is to provide a method and a tool to build and implement robots like the ones mentioned above in a much faster way than usual by using a graphical user interface to create a series of individually configured steps of action without having to code one single line of Java.