I. Field of the Invention
This invention relates to storing and searching for files on a computer system.
II. Background Information
Computer systems typically maintain images, documents, applications and other collections of data as files. Files are typically maintained on a hard disk drive associated with a computer; however, files may also be maintained in memory, on floppy drives, remote servers, or other types of mass storage media. Files are typically stored dispersed among numerous directories or folders. Each directory or folder may have sub-directories or sub-folders. A computer system may contain hundreds of such directories or folders, and file servers which may be a part of a computer system may have thousands of such directories or folders. A user usually locates a file by using the file""s filename (typically a short character string, e.g., xe2x80x9cfamily.imgxe2x80x9d); a disk drive designation and directory or folder may be prepended to the filename. While an attempt may be made to make filenames descriptive, it is virtually impossible to provide an adequate description for a file, in particular a file containing an image, in a short filename.
As used herein, a file may contain any combination of image information, text information or other information. A file may be, for example, a text document, a web page, an image, a legal decision or any other collection of text and data. The term file may include and be used interchangeably with the contents of a file.
Images are commonly stored on personal computer systems for use in presentations, documentation, artwork and other documents, for personal use (e.g., digitized photographs of family members), for use in the computer operation itself (e.g., icons and screen savers) and for other uses. These images are digitized, stored and displayed in standard formats such as .gif (graphical interface format), .bmp (bitmap format), .tif (tagged image file format) and .jpg (JPEG format). Images are typically represented on a computer display as a matrix of pixels of various colors and intensities and may be printed to paper, transmitted to other computer systems, stored, and associated with documents or applications.
Images or other files may be stored in a central location, such as an image database. However, it is impractical to centralize image storage to allow for searching, as different applications often have storage location requirements for files (e.g., storing files either near to or in separate directories from applications or application data) and users may store files in various locations. Certain computer applications associate files with documents, presentations or other files produced by those applications. When associated in this manner, the filename for the associated file may be cryptic, and may be significant only to the application which is able to access and manipulate the files. Furthermore, the lack of descriptiveness in filenames makes even centrally stored files difficult to locate.
Another complication to the task of searching for images is their nature. Images are not amenable to verbal descriptions which can be used as search criteria by a computer, and are better categorized using feature recognition techniques. While humans are very good at by the task of feature recognition, it is impractical to search for an image file by having a person open, display and interpret each of the thousands of images in an image database.
Computer based feature recognition techniques exist, and may be used to search for images or other types of files stored on a computer system, but these techniques are imperfect, often focus on narrow aspects of an image as opposed to an image""s holistic aspects, and are unable to translate an image into the brief verbal descriptions which humans may use to categorize images. For instance, a computer based feature recognition module may recognize general information, textures, or hues, whereas a person would recognize an object (i.e., a rose in a vase). After analyzing a picture a computer may recognize a quality such as xe2x80x9cbluishxe2x80x9d where a person would generate a phrase such as xe2x80x9ccloudless sky.xe2x80x9d Feature recognition modules may recognize particular faces, but require a reference image, and are thus incapable of converting a verbal description (xe2x80x9ca photo of the owner of this computer system as a childxe2x80x9d) to a searchable criteria. Feature recognition modules may be incapable of recognizing, for example, a particular object from any of a number of angles.
Furthermore, current file search systems (searching for, for example, images) using feature recognition technology may not allow for feature recognizers to be used both selectively and with weightings. For example, current techniques do not allow a user to minimize the importance of the recognition of some features, emphasize the recognition of other features, and eliminate the consideration of still others.
When images, text documents and other information are stored in a file on a computer system by a person, the reference information the person uses to describe the file is information such as the source, content, and storage or generation date of the document, and the application associated with the document. Conventional storage methods, indexing files by filename and location, and searching using techniques such as feature extraction, record no contextual information relevant to the user or the circumstances of the storage; the search is not user-centric. Two different users, having two different sets of user-centric search criteria, are unable to express these criteria using current methods. Thus the two users use the same file search information (e.g., a sample reference image) and have the same results returned. Current systems do not allow a user to search for a file or image based on a combination of information such as file content, file creation, file source, an application or document associated with the file, a summary of the file, or other contextual information.
Therefore there exists a need for a method and system allowing a disparately stored set of files containing, for example, documents or images, to be stored in a way allowing a search based on different criteria at different times and a search where a user may emphasize certain criteria. There is a need to allow files to be searched and indexed without relying on imperfect feature extraction techniques. There is a need to allow files to be indexed based on a disparate range of information, such as source, content, creation context, and relevance to a user, and to allow a user to locate such files without the need to record a filename or location. There is a need to allow a user to tailor the criteria used in a file search, and to dictate which criteria are and which are not important in the search for a certain file.
A method and system are disclosed for storing a file to enable searching, where the file is stored in a set of files, a set of context information is extracted from the file and from the process of storing the file, the context information is stored in a set of associative information, and a reference to the file is stored in the set of associative information.