Typically, a database/content repository may store a large amount of data in the form of files of various types such as, but not limited to, electronic documents, multimedia files, image files, audio/music files, and so on. Efficient retrieval of content from the database/content repository may require indexing of the files stored in the databases/content repository. Typically, the database/content repository may index the files based on various attributes associated with the files such as, but not limited to, a file type, a file size, a file name, a hash code (e.g., Cyclic Redundancy Check (CRC) Code), and so on. However, to perform efficient natural language search on the files, the database/content repository may be required to index the files based on content within the files. Performing such content-based indexing on the multimedia files may be a non-trivial task.