In the Internet age, information is nearly limitless. Methods involving acquiring information have undergone changes, from flipping through a book, looking up a word in a dictionary, or conducting searches via search engines.
Today, there is so much information that differentiating or selecting a piece of information can be difficult. Therefore, many conventional methods are used to automatically capture data on the Internet, automatically differentiate the data, and analyze the data. Based on automatic keyword identification techniques, information to be used is selected from vast quantities of information.
Conventionally, “web page data capture” involves acquiring web page data using web crawlers or similar tools and then extracting useful data from the acquired web page data using program analysis. Part of what conventional search engines do is extracting web page data. As an example of a web page data extraction, a program extracts a news headline from a news channel of Sina.com.
On the other hand, many companies now are preventing their online information from being acquired by others. In other words, companies are preventing other organizations or individuals from obtaining unauthorized data via web page data acquisition technology. For example, product divisions of some companies have discovered that authenticated commercial licensing information on their own company websites have appeared on other non-company websites. The product divisions of these companies have deduced that the commercial licensing information was acquired via a web crawler or similar tools. Such illicit acquisition of the information of others without consent of the authorizing party or owner is illegal. However, owners have no choice other than disclosing this information on the Web in order to make public their lawful status.
Therefore, to prevent the capture of information disclosed on the Web by web crawlers or other such tools, some websites have adopted a text-to-picture processing method. In this way, the websites can prevent web crawlers or other such tools from capturing information disclosed on their websites.
However, in the above text-to-picture processing method, when a page has many “text pictures” to be displayed, browsers, regardless of type, will have to issue many Hypertext Transfer Protocol (HTTP) requests to display the text pictures (one request per picture). The issuing of the many HTTP requests has a large impact on front-end performance of page display.
In summary, picture processing of text used to hinder web crawlers or other such tools from capturing information disclosed on the web also causes a slowdown in browsers when displaying web pages having many pictures.