The present invention, in some embodiments thereof, relates to a network-based gathering of background information and, more particularly, but not exclusively, to such a method where the network is the Internet.
The Internet offers unprecedented access to information. When thinking of querying the Internet one generally thinks of a model in which a query is input by a user and leads to one or more answers provided at different locations, one or two of which are most relevant.
Another model of querying the Internet is to start with a query leading to a first site and then follow links from the first site to subsequent sites. Such a model is known as surfing and is very intuitively supported by the World Wide Web.
A third model does not assume a localized answer. A query may lead to different locations on the Internet, the locations having different information that needs to be compiled together in order to answer the query.
A common feature of all of these models is the initial query, generally provided by a user. The query is used as the basis of a search which is carried out over the Internet using considerable computing power.
The search itself is carried out using databases gathered automatically from the Internet, generally in a non-query-based general search trawling information from all over the Internet.
The need for a query makes Internet searching something of an art form, and makes it difficult to provide fully automated Internet Information gathering, yet in order to provide automated information gathering, automatic query generation is required, and for such automatic query generation some level of a semantic understanding of the data is necessary. One of the problems with trying to semantically understand data on the Internet is that web sites are independently constructed and have different ways of presenting information so that a computerized system has difficulty finding even the same information from different websites.
Projects, such as the EU-funded Okkam project use semantics to people and machines to find, share and integrate information more easily. With Okkam, the main objects being scanned are no longer documents that just happen to contain certain keywords, but entities, such as people, locations, organizations or events.
The core Okkam infrastructure stores and makes available for reuse so-called global identifiers which can be applied to and used by anyone or anything across formats and applications. The project is concerned with distributed information and knowledge management.
A goal recognized by some of the stakeholders in the Okkam project is to allow a searcher to integrate Web 2.0 type information that a user may have placed on different social media sites. Okkam envisages achieving such integration by providing users with their own global identifiers. If the global identifiers are not used for any reason then integrating the information is not possible using the Okkam methodology.
U.S. Pat. No. 8,584,139, to Ari Katz et al, filed May 22, 2007 teaches apparatus and a method for connecting incompatible computer systems. Specifically, a proxy is located on a network between one or more client applications and a server application, comprises an input unit for receiving input data from a first client application and from a server application; a modifying unit for modifying server data by insertion of client data; and a handling unit for submitting the modified content data to the client to allow the client to review and further modify the client data within the server data and submit the modified data back to the proxy for subsequent submission to the server. The proxy thus uses client data to prefill web forms, using mapping and the like, which the client can then review and modify before submission to the server, thus avoiding substantial rekeying.
The above cited document teaches compatibility between closed sets of computer systems but fails to discuss searching and query formulation.