1. Field of the Invention
This invention relates in general to speech processing systems performed by computers, and in particular to a method and system for highly efficient media speech recognition processing.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. RDBMS software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Organization (ANSI) and the International Standards Organization (ISO).
A typical database management system includes both database files and index files. The database files store data in the rows and columns of tables stored on data pages. In such a table, the rows may correspond to individual records while the columns of the table represent attributes of the records. For example, in a customer information table of a database management system, each row might represent a different customer while each column represents different attributes of the customers, such as the name of each customer, the amount owed by each customer and the cash receipts received from each customer.
Instead of providing for direct sorting and searching of the records in the tables, the database management system relies on the index files which contain information or pointers about the location of the records in the tables stored in the database files. The index file can be searched and sorted (scanned) much more rapidly than can the database files. An index file is scanned through transactions in which criteria are stipulated for selecting records from a table. These criteria include keys which are the attributes by which the database finds the desired record or records using the index. The actions of a transaction that cause changes to recoverable data objects are recorded in a log. In database management systems all data are stored in tables on a set of data pages that are separate from the index file. A table can have one or more indexes defined on it, each of which is an ordering of keys of the row of the tables and is used to access certain rows when the keys are known.
Large database archives, such as the ones used in audio and video libraries of media and other communications industries and educational institutions, depend on content management systems and their media indexing applications to create accurate indexes in order to locate and manage the archived content. Often, the index information has to be obtained in less than ideal situations, such as extracted from audio tracks, and then processed and converted by speech to text conversion technology. Conventional speech to text conversion methods work very well with closely microphoned voice input, using a trained voice and a controlled, small language set. For example, in medical radiology, speech to text conversion technology produces data with very high conversion accuracy. However, when this conventional speech to text conversion technology is applied to non-ideal conditions, such as a multi-way telephone conversation, a commercial video with background music on the audio track or an audio/video conference, the accuracy of speech converted into text is poor and unsatisfactory.
Proper indexing is critical for efficient search and management of large archives or content collections and, therefore, it is necessary to accurately extract and translate index keywords so that they can serve as an effective input for archive indexes. Conventional techniques tend to filter or process the speech audio input, or cause isolation of the spoken audio track prior to processing but these methods are not possible or effective in some situations.
Recently, the technologies for filtering, audio preprocessing and speech processing, including semantic and linguistic methods, have improved. However, the speech to text conversion accuracy is still at a level which is unacceptable for content management indexing applications. Therefore, there is a need for a simple, optimized and generic method and system for improving the efficiency of the speech to text conversion processing, thus increasing text accuracy in media speech recognition and processing systems.