Organizations and individuals frequently use metadata tags to assist in the categorization of digital information. Users frequently create metadata tags, which can include anything from file names to ID3 tags for mp3 files, by concatenating multiple keywords together. The users may similarly rely on metadata tags as “road signs” to aid them in determining which folders, files, or network nodes contain the information or services they are searching for. Organizations and individuals often use internally consistent conventions when creating metadata tags.
Nevertheless, the conventions used by a given organization or individual may not match the conventions used by another organization or individual. Further, the structure of metadata tags may not follow the typical tokenization of a natural language such as English. Traditional procedures may rely on matching a user-provided substring to the corpus being searched rather than treating a text string as a collection of meaningful terms. For example, traditional procedures may depend on a fixed spacing and/or capitalization within a substring to detect a substring match. Accordingly, it may be difficult or impossible to accurately search or automatically categorize items according to their metadata tags (e.g., if the metadata tags use keywords according to different capitalization and/or spacing conventions). Accordingly, the instant disclosure identifies and addresses a need for additional and improved systems and methods for tokenizing user annotated names.