Video can be effective for capturing and communicating with others and has become increasingly important, not only for unilateral broadcasting of information to a population, but also as a mechanism for facilitating bidirectional communication between individuals. Recent advance in compression schemes and communication protocols has made communicating using video over cellular and data networks more efficient and accessible. As a result, applications or “apps” that generate video content are fast becoming the preferred mode of sharing, educating or advertising products or services. This content is increasingly being designed for and viewed on mobile devices such as smartphones, tablets, wearable devices, etc. When video content is shared, there is a need for efficient data transfer through wired and/or wireless cellular and data networks. However, there remains challenges to the use and distribution of video over cellular and data networks.
One such challenge is the constraints associated with bandwidth for transmitting video over networks. Typically, the (memory) size of a video can be dependent on a length of the video. For example, raw digital video data captured at high resolution creates a large data file that is often too large to be efficiently transmitted. Under the transmission constraints of some networks, if a sender captures a video in high resolution, and transmits it to a recipient, the recipient may have to wait for seconds or minutes before the video is received and render. This time lag is both inconvenient and unacceptable.
To minimize the strain on mobile networks, video content is most often transmitted in compressed form. Videos are compressed by hardware or software algorithms called codecs. These compression/decompression methods are based on removing the redundancy in video data (Wade, Graham (1994). Signal coding and processing (2 Ed.). Cambridge University Press. p. 34. ISBN 978-0-521-42336-6). Video data may be represented as a series of still image frames. When displayed to the viewer at frame rates greater than 24 frames per second, the viewer perceives that the image is in motion, i.e. a video. For example, the noted algorithms analyze each frame and compare them to adjacent frames to look for similarities and differences. Instead of transmitting each entire frame, the codec only sends the differences between a reference frame and subsequent frames. The video is then reconstructed frame by frame based on this difference data. Some of these methods are inherently lossy (i.e. they lose some of the original video quality) while others may preserve all relevant information from the original, uncompressed video.
Such frame-based compression can be done by either transmitting (i) the difference between the current frame and one or more of the adjacent (before or after) frames, referred to as “interframe”; or (ii) the difference between a pixel and adjacent pixels of each frame (i.e. image compression frame by frame) referred to as “intraframe”. The interframe method is problematic for mobile transmission because if and when the data connection is momentarily lost, the reference frame is lost and has to be retransmitted with the difference data. The intraframe method solves this issue and is therefore more commonly used for digital video transmission.
Examples of the most prevalent methods include MPEG-4 Part 2 or H.263 or MPEG-4 Part 10 (AVC/H.264) or the more recent H.265. Finally, these codecs may be further optimized for mobile phone network transmittal such as the 3GP or 3G2 standard.
The size of these frame-based compression methods is still dependent on the initial size of the raw digital video file and encodes and then decodes each frame one by one. Therefore they are all dependent on the duration of the raw video. For example a video that was recorded in 480p (i.e. 480×640 pixels) and with a duration of 1 minute creates a MPEG-4 video file of 28.2 MB. This 1 minute video file, when uploaded with a 3G wireless network connection (data transmission rate of 5.76 mbps or 0.72 MB/sec), takes approximately 39 seconds to upload. However, for the same 1 minute video at 1080p or HD resolution, the upload time balloons to 164 seconds or 2 minutes and 44 seconds. Although faster HSDPA and LTE data protocols are prevalent in North America, they only make up approximately 10-15% of all the world's 7 billion mobile phone users currently.
Another challenge regarding the use and distribution of video as a means of communicating, is the lack of a user-friendly, resource efficient platform that allows users to create video messing threads. For example, in recent years, a wide variety of text messaging applications or “apps” have been introduced for use on phones, smartphones and laptops. While many of these apps provide for the addition of images and videos into the text message thread, these conventional apps, are not designed or optimized for video messaging as they require multiple steps to create and send the video messages. These multiple steps are both cumbersome and on most devices not intuitive.
To illustrate this point conventional text messaging applications, such as native text messaging applications on phones or smartphones (e.g. WhatsApp Messenger from WhatsApp Inc., Facebook Messenger from Facebook, Kik Messenger from Kik Interactive Inc., etc.), typically require iterative interactions between the user and the user's phone (e.g., tapping a button, adding text, and swipe or other gestures) before a video can be incorporated into the text messaging application. For example on an Apple iPhone 5 (iOS version 6.1.4), creating a video message using the native “Messages” app requires a minimum of nine (9) distinct user steps or interactions. A similar number of user steps are required for a user to respond to a video message with another video message. This is not only time consuming and cumbersome requiring the user to first identify the correct steps and then execute them quickly and without error, but also an inefficient use of computing resources.
Some conventional video sharing apps offer some improvement in both the number of steps and time required to create a video using a mobile device. Examples of such video sharing apps are Keek from Keek Inc., Vine from Vine Labs, Inc., Viddy from Viddy Inc. Instagram video from Facebook. Creating a video message in these conventional video sharing applications, however, also requires multiple steps. For example on a Samsung Galaxy Note 2 (OS version 4.1.2), creating and sending a video message using Facebook's Instagram video app requires six (6) distinct steps. Additionally, most of these conventional video sharing apps, with the exception to Keek, cannot be used for video messaging (e.g., an exchange of sequential videos between individuals including video messages and video responses) as there is no capability to respond to the initial video message with a video message. Furthermore, most of these apps upload the videos to application servers in the app foreground, therefore suspending the use of the device until the video uploads, resulting in an inefficient use of computing resources.
Some conventional video messaging apps offer further improvement in both the number of steps and time as compared to text messaging and video sharing platforms. See U.S. Patent Application 20130093828. Examples of such conventional video messaging apps are Snapchat from Snapchat Inc., Eyejot from Eyejot, Inc., Ravid Video Messenger from Ravid, Inc., Kincast from Otter Media, Inc., Skype video messaging from Microsoft and Glide from Glide Talk, Ltd. These conventional video messaging apps, however, still maintain the format and structure of text based messaging. This message and response framework works well for text based messages, but is still slow, cumbersome and difficult to navigate with video messages and responses.
Furthermore, while some video platforms allow video to be delivered with additional features and functionalities, such as text transcripts and clickable hot spots that link to other content or information, the manner in which there additional features and functionalities are associated with or included in a video can also require additional steps or time, which introduces inefficiencies into providing supplemental information in or with videos that are distributed. For example, in the case of YOUTUBE, speech recognition is performed by a speech to text recognition engine or manually by the author after a video is uploaded to a remote server. This process of creating the text transcript can take several minutes to hours depending on the several factors. U.S. Patent Publication No. 2012/0148034 describes a method for transcribing speech.
Some conventional video platforms can be used to embed supplemental data, such as hot spots, into a video after a video has already been created such that the hotspots can be added to over overlaid on the video. As one example, when a hotspot in a video is scrolled over, the video can pause and the hotspot become active providing either information or links to additional information. As another example, U.S. Patent Publication No. 2012/0148034 provide for the ability of the author or a recipient a video to pause the playback of the video at a particular time and record a response in context to the content of the original message included in the video. When the original message is viewed for playback, the author or a recipient will be able to hear or see the message and see the embedded hot spot or thumbnail showing a response. When this thumbnail is clicked, the recipient is taken to the response recorded earlier. As with the earlier cited prior art, these hot spots are added only after the initial video is complete and viewed or reviewed upon playback (i.e., the prior art requires that the speech recognition and transcription take place only after the completion or upload of the video).
The slow and tedious video creation process of conventional apps cannot or does not easily facilitate the (i.) creation of video messages and responses (herein “video thread”); (ii.) creation of video thread by multiple users or respondents; (iii.) communication and collaboration between users where context and tonality is required; (iv.) creation of multi user or crowdsourced video content to be used to communicate information about an activity, product or service; and (v.) addition of supplemental content to videos.
Furthermore, the present disclosure relates to multimedia (e.g., picture and video) content delivery, preferably over a wireless network. Particularly, the present disclosure relates to dynamically optimizing the rendering of multimedia content on wireless mobile devices. Still more particularly, the present disclosure relates to the dynamic rendering of picture or video content regardless of device display orientation or dimensions or device operating system embellishments.
Another challenge is that devices on which videos are played back have different specifications and hardware configuration, which can result in the video being improperly displayed. For example, the use of mobile devices tethered to a wireless network is fast becoming the preferred mode of creating and viewing a wide variety of image content. Such content includes self-made or amateur pictures and videos, video messages, movies, etc. and is created on hand-held mobile devices. Furthermore, this content is delivered to a plurality of mobile devices and is rendered on the display of these devices. These mobile devices, such as mobile smartphones, are manufactured and distributed by a multitude of original equipment manufacturers (OEMs) and carriers. Each of these devices has potential hardware and software impediments that prevent the delivered content from being viewed “properly,” e.g., rendering of the picture or video in the correct orientation (and not rotated 90° to the right or left or upside down) based on the orientation that the playback device is held and in the same aspect ratio as when captured or recorded. These impediments include differing display hardware, resolutions, aspect ratios and sizes as well as customized software overlays that alter the native operating system's (OS) display. For example, often the created video is of a different resolution and/or aspect ratio than the device that it is being viewed on. This creates a mismatch and the video is not properly rendered on the recipient's screen.
Specifically, some faults that can negatively affect viewing or playback of an image include the image rotated to an orientation other than the orientation of the playback device and image not rendered in the aspect ratio that the image was initially captured or recorded. For examples, rendered videos on playback devices can be rotated 90° to left or right or rotated 180° (upside) down or vertically or horizontally compressed or stretched or in some cases, only a portion of the video may be rendered. In the most severe case, the video may not render at all and the application terminates or crashes.
These faults can be caused by the inability of the playback device to read the encoded metadata that accompanies the video file. This metadata can contain information about the image such as its dimensions (resolution in width and height dimension), orientation, bitrate, etc. This information is used by the playback device's OS and picture or video playback software (or app) to correctly render the image on its display. The inability to read the encoded metadata can stem from the use of older OS, the picture or video is converted to an incompatible format or is resized that strips this metadata outright, or other. Examples of this older operating system incompatibility may be found in the Android OS prior to its API Level 17 or Android 4.2 release. In devices operating with OS versions prior to this, the orientation metadata is not recognized and used. There are also situations when the PBD's OEM has modified the OS with overlays. Such modifications can prevent the PBD from properly reading some or all of the picture or video metadata causing the image to be rendered incorrectly.
In a small, discrete universe of devices, these impediments can be addressed and overcome using a monolithic operating system such as the iOS operating system from Apple, Inc. In the case of a family of devices, the number of unique devices and display dimensions or resolutions is low, e.g. approximately 20 devices and approximately 5 unique versions of the iOS operating system. The corrections to the application delivering the image content are made on a case by case basis for each device and OS version.
However, for devices and operating systems, such as the Android OS from Google Inc., that are open source and allow for a large amount of hardware and OS variation, the number of unique device-OS combinations number in the thousands. Additionally, due to the nature of this industry, new devices are introduced on a daily basis. Therefore corrections for image resolution mismatches, orientation errors and software issues quickly become impossible to address on a case-by-case basis.
The industry has addressed this issue by using detailed libraries in the code that provide the necessary information for each possible device and their respective display sizes, aspect ratios and software limitations. By using a detailed library, when a particular device calls for a playback, the image is delivered and the app compares the playback device specifications to the library and makes the suitable corrections. This methodology is inefficient, in part, because of the delay from using an additional application that contains an extensive device library. It is also prone to error because of the necessity of the library to be constantly updated. See U.S. Pat. Nos. 8,359,369; 8,649,659; and 8,719,373; and U.S. Patent Publications Nos. 20130103800; 20120240171; 20120087634; and 20110169976, each of which are incorporated by reference in their entirety.
The present disclosure relates to a system and method of rendering any image, e.g., playback of a video, regardless of resolution or initial orientation, on any playback device with display resolutions, orientations, OS's and modifications different from the capturing device, such that the image is rendered without anomalies or faults.