The present invention relates to databases, and in particular to systems for conveniently creating, indexing and retrieving media content including audio, image and video data and other time-sequence data, from a repository of media content.
With the advent of the Internet and the proliferation of digital multimedia technology, vast amounts of digital media content are readily available. The digital media content can be time-sequence data including audio and video data. Databases of such digital media content have grown to voluminous proportions. However, tools for conveniently and effectively storing such data for later retrieval and retrieving the data have not kept abreast of the development in the volume of such data.
Attempts have been made to manage databases of video data. However, such systems are characterised by being difficult to achieve automatic and convenient indexing and retrieval of media information. Further, such systems typically have a low level of retrieval accuracy. Therefore, a need clearly exists for an improved system of indexing and retrieving media content.
In accordance with a first aspect of the invention, there is disclosed a method of voice annotating digital media data. The method includes the steps of: speech annotating one or more portions of the digital media data; and indexing the digital media data and speech annotation to provide indexed media content.
Preferably, the method also includes the step of creating a word lattice using the speech annotation. It may also include the step of recording the speech annotation separately from the digital media data. Optionally, the speech annotation is generated using a formal language. Further, the annotating step can be dependent upon at least one of a customised vocabulary and Backus-Naur Form grammar. Still further, the step of creating the word lattice may be dependent upon at least one of acoustic and linguistic knowledge.
Preferably, the method includes the step of reverse indexing the word lattice to provide a reverse index table. It may also include the step of content addressing the reverse index table.
In accordance with a second aspect of the invention, there is disclosed an apparatus for voice annotating digital media data. The apparatus includes: a device for speech annotating one or more portions of the digital media data; and a device for indexing the digital media data and speech annotation to provide indexed media content.
In accordance with a third aspect of the invention, there is disclosed a computer program product having a computer readable medium having a computer program recorded therein for voice annotating digital media data. The computer program product includes: a module for speech annotating one or more portions of the digital media data; and a module for indexing the digital media data and speech annotation to provide indexed media content.
In accordance with a fourth aspect of the invention, there is disclosed a method of voice retrieving digital media data annotated with speech. The method includes the steps of: providing indexed digital media data, the indexed digital media data derived from a word lattice created from speech annotation of the digital media data; generating a speech query; and retrieving one or more portions of the indexed digital media data dependent upon the speech query.
Preferably, the method further includes the step of creating a word lattice from the speech query. The word lattice may be created dependent upon at least one of acoustic and linguistic knowledge. The method may also include the step of searching the indexed media data dependent upon the speech query by matching the word lattice created from the speech query with word lattices of the indexed media data. It may also include the step of confidence filtering the lattice created from the speech query to produce a short-list for the searching step.
Optionally, the method further includes the step of searching the indexed digital media data dependent upon a text query. Further, the speech query can generated dependent upon at least one of a customised vocabulary and Backus-Naur Form grammar.
In accordance with a fifth aspect of the invention, there is disclosed an apparatus for voice retrieving digital media data annotated with speech. The apparatus includes: a device for indexed digital media data, the indexed digital media data derived from a word lattice created from speech annotation of the digital media data; a device for generating a speech query; and a device for retrieving one or more portions of the indexed digital media data dependent upon the speech query.
In accordance with a sixth aspect of the invention, there is disclosed a computer program product having a computer readable medium having a computer program recorded therein for voice retrieving digital media data annotated with speech. The computer program product includes: a module for providing indexed digital media data, the indexed digital media data derived from a word lattice created from speech annotation of the digital media data; a module for generating a speech query; and a module for retrieving one or more portions of the indexed digital media data dependent upon the speech query.
In accordance with a seventh aspect of the invention, there is disclosed a system for voice annotating and retrieving digital media data. The system includes: a device for speech annotating at least one segment of the digital media data; a device for indexing the speech-annotated digital media data to provide indexed digital media data; a device for generating a speech or voice query; and a device for retrieving one or more portions of the indexed digital media data dependent upon the speech query.
Preferably, the system also includes a device for creating a lattice structure from speech annotation. This device can be dependent upon acoustic and/or linguistic knowledge.
Preferably, the speech-annotating device post-annotates the digital media data. The speech annotation can be generated using a formal language.
The systems can also include a device for reverse indexing the lattice structure to provide a reverse index table. Still further, it may include a device for content addressing the reverse index table.
Preferably, the system includes a device for creating a lattice structure from the speech query. It may also include a device for searching the indexed digital media data dependent upon the speech query by matching the lattice structure created from the speech query with lattice structures of the indexed digital media data. The system may also include a device for confidence filtering the lattice structure created from the speech query to produce a short-list for the searching device. The lattice structure can be created dependent upon at least one of acoustic and linguistic knowledge. Still further, the system may include a device for searching the indexed digital media data dependent upon a text query.
Preferably, at least one of the annotating device and the speech query is dependent upon at least one of a customised vocabulary and Backus-Naur Form grammar.