Recording videos using a smart phone or a digital video recorder has become a commonplace occurrence. However, the person recording the video is generally excluded from the captured video. For example, a father desires to record a family event, but he is out of the scene and the only indication of his presence is the audio signal. Although the father can choose to turn the camera round to capture a video of himself afterward, his real-time reaction and expression during the family event is gone already. Therefore, there remains a need for a method and system to record a video memory that includes both the photographer and the scene participants at the same time.
U.S. Patent Application Publication 2011/0243474 to Ito, entitled “Video image processing apparatus and video image processing method,” presents relevant information about an object of interest to a viewer in an appropriate timing based on the display state of objects that appear in a video image. A video image processing apparatus processes the additional information including content data and relevant information about the respective objects. A display feature information calculation unit acquires frame data indicating the display state of an object to be displayed in each frame constituting video data and calculates display feature information about the object to be displayed in each frame. A frame evaluation unit evaluates a frame using an evaluation criteria relating to the degree of attention of the object within a frame based on the calculated display feature information. A display timing determination unit determines a frame at which displaying relevant information about the object is to be started in accordance with the frame evaluation result. A display data generation unit generates data for displaying relevant information about an object, and a superimpose unit superimposes the data with video data, and output the superimposed data to a display unit.
U.S. Pat. No. 7,443,447 to Hirotsugu, entitled “Camera device for portable equipment,” discloses a camera device capturing a plurality of images and superimposing them to output image data of a superimposed image. The plurality of images is captured by a plurality of cameras. A processor superimposes the plurality of images to produce the superimposed image, which is displayed on screen and is sent by moving-image mail. This approach has the disadvantage that the superimposed image can often obstruct important features of the background image.
U.S. Patent Application Publication 2003/0007700 to Buchanan et al., entitled “Method and apparatus for interleaving a user image in an original image sequence,” discloses an image processing system that allows a user to participate in a given content selection or to substitute any of the actors or characters in the content selection. The user can modify an image by replacing an image of an actor with an image of the corresponding user (or a selected third party). Various parameters associated with the actor to be replaced are estimated for each frame. A static model is obtained of the user (or the selected third party). A face synthesis technique modifies the user model according to the estimated parameters associated with the selected actor. A video integration stage superimposes the modified user model over the actor in the original image sequence to produce an output video sequence containing the user (or selected third party) in the position of the original actor.
U.S. Patent Application Publication 2009/0295832 to Susumu et al., entitled “Display processing device, display processing method, display processing program, and mobile terminal device,” discloses a display processing device including a face image detecting unit for detecting the user's face image based on imaging data output from a camera unit provided on a cabinet, a position/angle change detecting unit for detecting a change in the position of the user's face image and a change in the face angle, and a display control unit that displays a predetermined image on a display unit, moves the position of the display image in accordance with a change in the position of the detected user's face image, the change occurring in the x-axis direction and the y-axis direction, performs enlargement/reduction processing based on a position change in the z-axis direction, performs rotating processing in accordance with a change in the face angle so that an image viewed from the face angle is obtained, and displays the obtained image on the display unit.
U.S. Pat. No. 7,865,834 to Marcel et al., entitled “Multi-way video conferencing user interface,” discloses a videoconferencing application that includes a user interface that provides multiple participant panels, each of which is displayed using perspective, with the panels appearing to be angled with respect to the user interface window. The participant panels display live video streams from remote participants. A two-way layout provides two participant panels for two remote participants, each of which is angled inwardly towards a center position. A three-way layout provides three participant panels for three remote participants, with a left, center and right panel, with the left and right panels angled inwardly towards a center position.
U.S. Patent Application Publication 2011/0164105 to Lee et al., entitled “Automatic video stream selection,” discloses an automatic video stream selection method where a handheld communication device is used to capture video streams and generate a multiplexed video stream. The handheld communication device has at least two cameras facing in two opposite directions. The handheld communication device receives a first video stream and a second video stream simultaneously from the two cameras. The handheld communication device detects a speech activity of a person captured in the video streams. The speech activity may be detected from direction of sound or lip movement of the person. Based on the detection, the handheld communication device automatically switches between the first video stream and the second video stream to generate a multiplexed video stream. The multiplexed video stream interleaves segments of the first video stream and segments of the second video stream.
In an alternative embodiment, the handheld phone may provide a “picture-in-picture” feature, which can be activated by a user. When the feature is activated, the video stream of interest can be shown on the entire area of the display screen, while the other video stream can be shown in a thumb-nail sized area at a corner of the display screen. For example, in the interview mode, the image of the talking person can be shown on the entire area of the display screen, while the image of the non-talking person can be shown in a thumb-nail sized area at a corner of the display screen. The multiplexed video stream includes interleaving segments of the first video stream and segments of the second video stream, with each frame of the multiplexed video stream containing “a picture in a picture,” in which a small image from one video stream is superimposed on a large background image from another video stream. However, similar to aforementioned U.S. Pat. No. 7,443,447, it has the disadvantage that the superimposed video image can often obstruct important portions of the background video stream.
U.S. Patent Application Publication 2011/0001878 to Libiao et al., entitled “Extracting geographic information from TV signal to superimpose map on image,” discloses a method for extracting geographic information from TV signal to superimpose a map on the image. Optical character recognition (OCR) is used to extract text from a TV image or voice recognition is used to extract text from the TV audio signal. If a geographic place name is recognized in the extracted text, a relevant map is displayed in a picture-in-picture window superimposed the TV image. The user may be given the option of turning the map feature on and off, defining how long the map is displayed, and defining the scale of the map to be displayed.