1. Field of Invention
The present invention relates to a data storing apparatus, such as a conference minutes recording system or a news gathering recording system, for storing data such as conversation audio from minutes and news gathering, and the images of conference and news gathering scenes and conference memos and news gathering memos related to these.
2. Description of Related Art
An apparatus is known for recording and playing back the record of conferences, lectures, news gathering, interviews, and conversations using telephones or television telephones, television videos, surveillance camera videos and the like using digital disks, digital still cameras, video tape and semiconductor memory and the like. If data is stored using these apparatuses, it is possible to record the data without losing any of the sound and images that comprise the input data.
Apparatuses of this kind include apparatuses that record digital signals transmitted via a computer network onto storage media, apparatuses that record without change analog input signals from video cameras or microphones onto storage media, and apparatuses that encode the input and convert it into a digital signal for recording.
However, a problem arises in that it is impossible to instantaneously search for the desired portion of the recorded sounds and images. Concerning this problem, tape recorders and VCRs have been proposed which make it easy for a person doing the recording to search for the important portions by attaching check marks to the important portions in the input audio signal and video signal by pressing a designated button with an arbitrary timing.
However, the check marks in this case simply designate the location of the important portions, and the problem then arises that the contents must be verified by playing back all of the checked partial audio signal and partial video signal because it is impossible to indicate to what interval portion of the audio signal or video signal these check marks correspond. Furthermore, it is necessary to perform the unnatural task of pressing buttons while listening to speech, creating the problem that it is impossible to concentrate on the speech.
Consequently, apparatuses have been proposed for storing, recording and playing back data by successively correlating input audio signals or video signals and user input data that is input at an arbitrary timing using a pen or keyboard. Using this kind of apparatus, the person doing the recording performs input using a pen or keyboard just like taking a memo and the audio signal or video signal is recorded. Afterwards, it is possible to play back the data easily by selecting the location of the desired audio signal or video signal with reference to the input memos.
For example, in Japanese Laid-Open Patent Publications 7-182365, 6-176171 and 6-343146 and ACM CHI '94 Proceedings, pg. 58-pg. 64 ("Marquee: A Tool For Real-Time Video Logging"), descriptions are given of apparatuses such that the recorded audio signal or video signal and the user input data are correlated on the basis of a time stamp. During playback, one of the user input data items displayed on a screen is designated and the audio signal or video signal recorded at the time the designated user input data was recorded is played back.
Furthermore, in Japanese Laid-Open Patent Publication 6-276478, an apparatus is described which accomplishes playback by correlating the successively input audio signals or video signals and stationary video designated by the operator on the basis of a time imprint.
In addition, in Japanese Laid-Open Patent Publication 6-205151, an apparatus is described wherein an index is appended to the input audio signal or input video signal which detects that the user input has been interrupted for a set time. During playback, one of the specific user-input data items displayed on the screen is designated and the audio signal or video signal starting from the index portion corresponding to the designated user-input data is played back.
However, the data storage apparatuses described above have a construction that records all of the input audio signal or video signal without compression and consequently, it is difficult to record a lengthy input audio signal or input video signal with a limited recording capacity. In general, when recording time-series data, such as successively input audio signals or video signals over a long period of time, the required storage capacity becomes enormous.
A method has been proposed wherein the audio signal or video signal is always stored on the storage medium while being compressed. In general, all of the input audio signals or video signals are stored using the same compression ratio. With this method, it is impossible to conserve storage capacity by recording only the important portions with high audio quality and high image quality. Thus, even the data having a small likelihood of being referenced later is recorded in a large volume. It is therefore, impossible to record audio and video signals taking into account the amount of available storage capacity and the importance of the data being recorded.
For example, assume that when recording the scenery of an interview over a lengthy time using Video for Windows ("Microsoft Video for Windows 1.0 User's Guide, pp. 57-59, pp. 102-108), the thinning compression ratio is set so that only one frame for every five seconds of the video signal is stored with the aim of conserving storage capacity. Here, the problem is created that even if the person doing the recording wants to play back portions that were felt to be important during recording, it is only possible to play back the video signal at one frame every five seconds. It is thus, impossible to reproduce the actions (gestures and the like), the manner of speech and the subtle nuances of the person speaking in the stored video. Conversely, when all of the input video signal are recorded at 30 frames per second, the storage capacity needed to store a lengthy interview becomes enormous.
Hence, in Japanese Laid-Open Patent Publications 2-305053 and 7-15519, an audio data storage apparatus is discussed which, when it is confirmed that the empty capacity of the storage medium is less than a certain amount, secures the empty areas of the storage medium by recompressing the audio data that has already been stored.
However, the apparatuses described in Japanese Laid-Open Patent Publication 2-305053 and Japanese Laid-Open Patent Publication 7-15519 are apparatuses for recompressing the audio signal stored using the same compression ratio for the entire signal. This creates the problem of not being able to record the important areas with a lower compression ratio and higher audio quality.
In addition, in Japanese Laid-Open Patent Publications 5-64144 and 5-134907, data storage apparatuses are discussed which, when the usage volume (data storage volume) of the image storage medium exceeds a predetermined amount, conserve the storage capacity by successively compressing the image data already stored starting with the old frames and by thinning out frames. These comprise apparatuses that conserve storage capacity by overwriting previously stored data with newly input data and by increasing the compression ratio of data that was stored first.
In addition, in data storage apparatuses for storing conferences, lectures, news gathering and interviews, when the apparatus is structured so as to simply retain the new recording as important data and erase the old recording as unnecessary data, such as is described in Japanese Laid-Open Patent Publications 5-64144 or 5-134907, the problem arises that recordings of important conferences or important news gathering or the like are overwritten by newly input data simply because those recordings were made first. In general, it is not possible to determine the level of importance of conference contents or news gathering contents merely on the basis of the date and time that the conference or news gathering was conducted.
In addition, the motion image recording apparatus mentioned in Japanese Laid-Open Patent Publication 6-144902 is an apparatus which accomplishes automatic scene change detection. When creating a digest, the apparatus extracts the data in order, starting with the scenes that have high importance so that the data reaches the time length designated by the user. Scenes are deemed to be more important the longer they are. With the apparatus described in this disclosure, it is possible to conserve storage capacity without losing important data if the apparatus retains only the scenes contained in the digest and deletes scenes not contained in the digest.
On the other hand, in Japanese Laid-Open Patent Publications 3-90968 and 6-149902, apparatuses are proposed which automatically create a video digest so as to reach the time length designated by the user. The apparatus described in Japanese Laid-Open Patent Publication 3-90968 is such that the user inputs the importance of each scene beforehand using an editor and, when a digest is created, the apparatus extracts the scenes starting with those having the highest importance so that the data reaches the time length designated by the user. In the case of this apparatus also, it is possible to conserve storage capacity without losing important data if the apparatus retains only scenes contained in the digest.
In addition, with the apparatus described in Japanese Laid-Open Patent Publication 6-149902, it is extremely difficult to partition the scenes by cut changes and scene changes when a conference or lecture is photographed using an unmanned camera. This creates the problem of not being able to detect the scene length. Additionally, when a conference or lecture is photographed, there are times when important statements are included even in short scenes. Consequently, the problem arises that it is not possible to determine the level of importance of conference contents or news gathering contents on the basis of the scene length alone.
Furthermore, with the apparatus described in Japanese Laid-Open Patent Publication 3-90968, the problem arises that it is extremely difficult to partition the scenes by cut changes and scene changes when a conference or lecture is photographed using an unmanned camera. Additionally, the work of inputting the level of importance using an editor after photography has been completed is extremely troublesome, so that the problem arises that this is not suitable for applications such as recording conferences or lectures.
However, there have been proposed apparatuses in which recorded data are either selected or rejected during recording, and apparatuses in which only data verified as important are recorded. For example, in Japanese Laid-Open Patent Publication 7-129187, an apparatus is described which records only a set time interval worth of the audio signal around when an audio integration key is pressed. In addition, in Japanese Laid-Open Patent Publication 6-343146, a method is described wherein video signals are recorded only for a set time interval with the timing input by the user. Furthermore, among commercially available tape recorders, there are models which have a silent period detection function so that audio is not stored during periods of silence.
However, these apparatuses do not have a means for recompressing data after the data has once been recorded. Consequently, it is impossible to change the compression ratio stepwise depending on the length of the storage period of the data or to change the compression ratio dynamically in accordance with changes in the empty storage capacity of the storage medium. Thus, the problem arises that the compression efficiency is extremely poor in comparison with methods for recompressing the image or audio data that has been stored.
In addition, as described in Japanese Laid-Open Patent Publications 7-129187 and 6-343146, in order to record time-series data slightly prior to detection of the trigger, a recording-use buffer memory for temporarily recording the input time-series data is necessary, creating the problem that the apparatus becomes more complex and more expensive.
Furthermore, with these apparatuses, the data that can be played back is strictly limited to recorded signals within a set time interval. Consequently, the problem arises that it is completely impossible to play back motion images other than the portion input with the user input timing. In addition, these apparatuses have the problems of the impossibility of recording what is spoken by the speaker from the starting portion and of having the recording terminate before the speaker has finished speaking.
Additionally, apparatuses that select and record only data recognized to be important are known. For example, Japanese Laid-Open Patent Publications 7-129187 and 6-343146 disclose an apparatus which records sounds for a certain amount of time before and after pressing of a sound incorporation key and a method in which video is recorded for a certain amount of time designated by a user, respectively.
However, in the method used in the apparatus of Japanese Laid Open Publications 7-129187 and 6-343146, it is not possible to specify a person who expresses opinions most frequently during a conference and store only the audio data or the image data of the specified person with high quality. It is also not possible to create a digest by extracting scenes in order of importance to fill the length of time designated by the user. In other words, these methods have the problem that compression of the audio or image data may not be executed using data which is obtained immediately after completion of recording of the audio data or image data.