With the development of information technologies and especially the Internet, more and more information is stored and transmitted in the form of electronic text in an electronic document. For example, in a web page, the text for a reader to read is typically saved as a HTML (Hyper Text Markup Language) file, which is stored in a remote server and loaded to the reader's computer when being read.
Many of the electronic texts currently available are duplicate texts. For one example, the same disclaimer text may appear in a series of financial information disclosures. For another example, a portion of an article may be copied to different places of a series of other articles. In some situations, it is desired to recognize and filter out duplicate portions in an electronic text in order to make reading the text more comfortable and less time consuming.