The present invention relates, in general, to a method of implementing structured and non-structured data in an XML document and, more particularly, to a method of implementing both structured data stored in a database and non-structured data stored in a legacy document in an XML document using XML replacement technology.
Internet interconnects many communication networks around the world, and computers connected to the Internet use a communication protocol called Transmission Control Protocol/Internet Protocol (TCP/IP) to communicate to each other.
Also, Hyper Text Markup Language (HTML) is one of data formats used on World Wide Web (WWW) and is a method of writing a hypermedia document. In other words, HTML defines a logical structure of hypertext using a standard document format called Standard Generalized Markup Language (SGML), and HTML is stored as a text file format.
To see a specific web page using a web browser such as Internet Explorer, a user has to input an address of Uniform Resource Locator (URL) of the web page. Accordingly, if the user does not know the address of URL of the web page, it is difficult to access the target document.
Consequently, software is needed to enable a user to search for any desired information among a huge amount of information on the Internet even if the user does not know the URL of each Internet site, and the software is called a search engine.
A search engine operates as follows. A search robot or a prescribed program that is called a spider program browses through multiple sites on the Internet, and stores information, previously collected from the web sites, on a database. When a user inputs a specific search word, a web site where contents corresponding to the search words are stored is selected and displayed. More specifically, when receiving a search request through keyword input from a user computer, the search engine operates a spider program through Common Gateway Interface (CGI).
CGI is a standard interface between an external program and a web server, which receives a data from a web browser installed in a user's computer as input; runs an external program according to the input data; and receives results from the execution of the external program. The operated spider program receives search results from an index database storing addresses of URL, information of various websites, and the like; converts the search results into HTML format; and transmits the converted HTML documents to the user's computer.
In the beginning of the Internet service, at the search engine side, both Internet sites and web documents were retrieved and classified, and a database was constructed using the classified data. Accordingly, when searching for specific data, these search engines used a directory search method that approaches the specific data by subdividing subject classification, which was previously configured according to a user's subject search or user's menu search.
However, rapid growth in the size of the World Wide Web (WWW) led to a sudden increase in the number of Internet sites, and thus it is not easy to effectively search for desired information using the directory search method. In other words, in accordance with the rapid growth in the size of the WWW, search engines have to expand data amount. However, the method of the existing search engines, that is, checking a web page and storing data from the web page into a database manually, may not correspond with the growth of WWW.
Consequently, search engines introducing the above mentioned search robot, which provides search service by retrieving and indexing a web page automatically, are developed. These search engines use a keyword (search word) search method. In other words, these search engines search for every web document related to the search word that a user inputs, and provide it to the user's computer. However, the amount of the web pages is so large that the user has to search for the desired content on a display of the search results once again.
On the other hand, XML, an acronym of eXtensible Markup Language, is a next generation Internet document standard that is essentially used in the Internet era. World Wide Web Consortium (W3C) defined XML as an Internet standard document in 1998. The structure of XML is easily understandable by people and easily manageable by machines. Also, XML resolves limitations of HTML for representing Web content and overcomes shortcomings of SGML.
HTML that has been most widely used for representing content on the Internet is appropriate for data representation, but has a limitation in reusing or retrieving documents. Accordingly, to resolve this problem, XML gives attention as a next generation Internet language because XML facilitates the expandability, compatibility, and structuring of information.
On the other hand, information on the Internet is largely divided into structured data and non-structured data. Generally, structured data is stored in a database while non-structured data is stored in a legacy document.
In this case, non-structured data stored in the legacy document is not easy to retrieve compared to structured data in a database. Furthermore, to view the legacy document, as the legacy document should be downloaded to a client computer, a space for storing the document and a dedicated viewer for the document are required.
For example, a method for searching data in the Internet and making a database with the data, applied for Korean Patent Application Publication No. 10-1998-0006152, discloses that a database is separately constructed for a specific field of data and a commercial retrieval service is available using the database. Also, a web browsing system and a web browsing method with adding links data on HTML document based on user's request, applied for Korean Patent Application Publication No. 10-2008-0015282, discloses that user can conveniently and effectively browse and search the Web by selectively adding a link data on an HTML document, which is received from a specific web server by a user's request and is interpreted by a web browser. However, as the above applications convert search results into HTML format and transmit the HTML documents to a user's computer, speed of retrieving data is slowed down. Also, if error occurs during the process of receiving the search results and in the process of converting them into HTML format, inaccurate search results may be displayed and non-structured data may not be stored.