The present invention relates generally to the field of electronic text repositories, and more particularly to data de-duplication.
Electronic text repositories refer to digital collections of text documents, such as a library of electronic books (eBooks) on a reader device or a collection of documents created in word processing software, for example. Modern computing technology enables storage and on-demand access to the files associated with electronic text repositories, limited only by the storage capacity of the device being used. Data de-duplication generally refers to compression techniques used to limit the storage size of files and documents which have repeating, i.e., duplicate, data. Repeated words in a text document are an example of duplicate data. Data de-duplication enables an efficient use of storage space so that a greater number of files and documents can be stored for accessing and viewing on a computing device.