Public libraries, national data warehouses, public service data banks, and historical newspaper databases often collect, categorize and mine historic records. Metadata management is often used for data categorization. For example, language tags have been used in metadata to classify, archive, categorize, and process collected international documents in text, graphic, audio, and video stream data formats under certain language, script, territory, and encoding categories. The language tag may be embedded and/or integrated into collected information to support networked information processing and management. Specifically, in HTML and XML, language tags may indicate the language of text or other items in HTML and XML documents.