1. Field of the Invention
The present invention relates to data management. More specifically, the present invention relates to a system and a method for extracting content from structured or unstructured documents.
2. Related Art
Extensible Markup Language (XML), a subset of Standard Generalized Markup Language (SGML), is a set of specification defined by the Word Wide Web Consortium (W3C) to facilitate the organization and exchange of information. Information contained in well-structured XML files can ensure reliability and interoperability among different applications across the Internet. Consequently, XML can significantly reduce the costs associated with data management and exchange by allowing exchange of data with different formats.
XML can also be used to define industry-specific content models. Once the content model is determined, different applications can use this content model to mark up information so that the information can be shared easily and effectively. For example, XML is widely used in areas such as electronic commerce, information-intense services, and telecommunication.
Unfortunately, majority of the information available on the Internet, especially on the Web, is either unstructured or structured with non-interoperable format. As a result, many publicly available documents cannot be easily shared, managed, and stored. This problem is further exacerbated by the proliferation of portable devices, which often have non-uniform display mechanisms.
Hence, a need arises for a system and a method for extracting content from documents and displaying such content on portable devices.