Embodiments of the present invention relate to the construction and use of unique, universal file coding in the identification of files.
Multiple devices such as desktops, laptops, smart phones, MP3 players, etc., and networking options such as the internet, local area networks (LAN), and wide area networks (WAN), etc., and software platforms are each capable of accessing and handling files. Files may reside upon multiple devices and within multiple storage media, including portable hard disks, flash drives, pen drives, etc. Thus, any given file or its content may be present on multiple devices, or within multiple locations (directories, file folders, etc) on any given device.
It is known for multiple individuals to collaborate in the creation and sharing of files (documents, spreadsheets, etc) and their content (audio, video, photos, etc), over the internet, through emails, wikis, blogs, social networking sites, file sharing sites, or via peer-to-peer file-sharing software, etc. Multiple persons may send or receive multiple emails with the same file attachments, or the same file may be uploaded in multiple locations (for example, into team-rooms, repositories, wikis, etc).
Thus, any file or its content may be propagated in endless permutations throughout such devices and locations. Problems arise in tracking, harmonizing and consolidating files and file versions through multiple file handling iterations. Often one file is renamed multiple times, sometimes retaining its format each time, though some versions may be reformatted. A file may also be renamed along with a format change, such as a conversion of an audio file from Windows Media Audio (.wma) format to an MP3 (.mp3) format.
Preventing data file duplication or eliminating duplicate files (sometimes referred to as “data re-duplication”), or recognizing that files with the same name and format are actually different files with different content, may present challenges. Content comparison techniques, such as hash file comparisons and solutions which compare document text or byte data between different files, often fail to identify duplicates. They may also falsely determine file duplications when two files with the same file name, and perhaps even similar hash code representations of their data, are actually different files.