Selecting subsequences from a sequence of tokens can be used for comparing sequences to determine if two documents are duplicates, for example. The tokens typically are sequences of one or more characters, words, other symbols (e.g., words). However, some existing approaches of selecting subsequences are inefficient for storing and processing redundant information of substantially overlapped subsequences. Some existing approaches are incapable of simultaneously selecting the same subsequences across different sequences of tokens while guaranteeing full coverage by the subsequences selected.