Networked content, like web pages, may come from various sources, including content entered by users. User entered content may contain various links. These links then can be accessed or navigated to by other users after the networked content is published. The user entered links could be internal or external links. External links have addresses that are outside of a particular domain name or domain names predefined as internal websites. Internal links are generally considered safe and reliable by default, while the safety of external links is difficult to guarantee.
In the existing technology, one method of detecting the safety of links is as follows: when the client sends an HTTP request to an application server, the application server detects whether the content of the current web page contains external links; if it contains external links, the application server processes the external links, for example, by filtering out the external links or converting them into plain text form. This detection method uniformly filters out all external links. While it is able to filter out unsafe external links, it simultaneously filters out safe external links. Such a detection method is often imprecise and can limit client browsing.
Another method of link detection in the existing technology is as follows: when the client sends an HTTP request to the application server, the application server detects whether the content of the current web page contains external links. If it contains external links, it compares the external links against unsafe links determined in a blacklist, and processes the external links that match the unsafe links contained in the blacklist, for example, by filtering out the unsafe external links.
In the existing detection methods, simple web link safety detection (such as domain name matching) is typically performed by the application servers, which are often unable to respond to situations when there is a high volume of web page accesses and safety verification logic is complex. Furthermore, manually maintained blacklists have long confirmation cycles and slow response speeds.