1. Field of the Invention
This invention relates generally to computer operating systems and more particularly to storing and retrieving filenames in computer memory.
2. Description of the Background Art
The storing and retrieving of filenames in computer memory is extremely important to all computer users. When a computer user saves a file and filename into computer memory, it is important that the filename remain uniquely identifiable regardless of any other filenames or text encodings saved in the memory. If a filename is not uniquely identifiable, then a computer may be unable to retrieve the named file. Further, if the memory containing the filename is moved to a different computer then that filename must remain identifiable if the named file is to be retrievable.
Conventionally, a filename identity is represented by a string of bytes (xe2x80x9cencodingxe2x80x9d) stored in computer memory. A conventional Roman character based computer system will interpret the encoding to represent Roman characters in the American Standard Code for Information Interchange (ASCII) character set, even if the encoding actually represents Japanese characters. For example, a Japanese computer user may save a file with a Japanese filename onto a removable memory device, such as a floppy disk. The Japanese filename encoding is interpreted by a conventional Japanese character based computer system to be Japanese characters. However, if the Japanese user then inserts the removable memory device into a conventional Roman character based computer system, the Roman computer system will assume the Japanese encoding actually represents a Roman character filename rather than a Japanese character filename.
A problem with the conventional Roman character based computer system is that because it assumes that a filename is in Roman characters, it may equate two non-Roman character filenames as being identical. This is because a Roman computer system treats uppercase and lowercase letters in a filename as equivalent. Therefore, a Roman computer system would assume that the filenames xe2x80x9cExample.txtxe2x80x9d and xe2x80x9cexample.txtxe2x80x9d (and their associated files) are the same even though they are represented by different strings of bytes, possible leading to the assumption that two non-Roman filenames, which vary only be case, are identical. If a Roman computer system misinterprets a non-Roman filename, the system may mistakenly open the wrong file or may refuse to create a new file since it believes that the filename is already in use.
FIG. 1 is a diagram of Japanese characters in which characters within any given column appear identical to a conventional Roman character based computer system. For example, characters 104, 106, 108 and 110 in column 102 appear identical to a prior art Roman computer system because it treats all filenames as if they were written in the Roman alphabet. Therefore, if two Japanese filenames differed by just one character, such as characters 104 and 106, a prior art Roman computer system would actually consider then to be identical. Similar problems occur with other text encodings but the problem is most acute in Japanese and Chinese text encodings since in these languages each character is a word and therefore filenames are shorter and more likely to vary by just one character.
A Roman character based prior art system can only store filenames in Roman text encodings as partially represented by ASCII text encoding table 200 of FIG. 2. Each Roman character has its own encoding. For instance, character 202, the letter xe2x80x9cAxe2x80x9d, is stored as 7-bit encoding 204. However, because ASCII only allows 7 bit encodings, which means that ASCII can encode only 128 characters, basic ASCII encoding table 200 contains no encodings for Japanese or any other language that uses non-Roman characters. Japanese and other east-asian languages can easily have several thousand characters that need to be encoded. Therefore, a prior art Roman character based computer system cannot always accurately store or retrieve some east-asian filenames or other non-Roman filenames.
Therefore, an improved system and method are needed to store and retrieve filenames and files in a computer system.
The present invention provides a system and method for accurately storing and retrieving filenames in computer memory by converting filenames into Unicode text encoding. The Unicode Standard, like the ASCII text encoding standard and others, encodes each character as a numerical value. However, instead of encoding simply in ASCII, Unicode text encoding encodes all the characters used in the world""s major written languages, including Greek, Arabic, Tamil, Thai, Japanese, Korean and many others.
The invention stores a filename into computer memory by first determining a default text encoding based upon which it converts the filename into Unicode text encoding. If the conversion is successful, the invention stores the Unicode text-encoded filename into computer memory and sets a bit that corresponds to the default text encoding in an Encoding Bitmap located in computer memory.
If the conversion based on the default text encoding is unsuccessful, the invention tries using Roman text encoding to convert the filename into Unicode text encoding. Once the conversion is complete, the invention stores the filename into computer memory and sets the bit that corresponds to Roman text encoding in the encoding bitmap. The invention assumes that any sequence of bytes can be converted to Unicode using Roman text encoding, which assigns a meaning to every possible byte sequence. If conversion using the default encoding fails, conversion using Roman text encoding will definitely succeed, even if it produces the wrong Unicode characters.
To retrieve a filename, the invention first converts the retrieval request into Unicode text encoding based on the default text encoding of the system. The invention then searches the computer memory for a matching Unicode text encoded filename. If the search is successful, the search result is returned. If the search is not successful, the invention determines if Roman text encoding is the default text encoding. If Roman text encoding is not the default text encoding, the invention uses Roman text encoding to convert the retrieval request into Unicode text encoding and then searches the computer memory for a matching Unicode filename. If the search is successful, a search result is returned.
If the search is not successful, or if Roman text encoding is the default text encoding, the invention next retrieves a list of all text encodings previously used in the system as specified in an Encoding Bitmap located in the computer memory of the system. The invention then converts the retrieval request into Unicode text encoding based on each text encoding specified in the encoding bitmap and uses each conversion to search the computer memory for a match. If a match is found, the invention returns the search result.
Finally, if the search is still not successful the invention converts the retrieval request into Unicode text encoding based on any other text encodings installed in the computer memory that have yet to be tried. The invention then uses each conversion in searching the computer memory for a matching Unicode filename. If the search is successful, the invention returns the search result. If the search is not successful, the invention returns an error message.
Accordingly, the present invention not only more accurately and efficiently stores and retrieves filenames in computer memory but also allows multiple encodings to be used in computer memory over time.