Loss Tolerant Data
Video data, especially video surveillance data, demands ever greater amount of space in storage systems. As camera resolution and frame rates get higher, sensors emitting video data get more proliferated, and/or the number and complexity of configurations being monitored increase, the aggregate data generated gets more voluminous and, as a consequence, requires more and more storage. To meet this need, system administrators are presented with the choices of either adding ever-larger collections of storage drives, or, adding higher capacity drives to existing configurations, or both. The drive industry can now offer drive size exceeding 10 Terabytes (TB) and drive storage capacity is only likely to continue to increase. Unfortunately, traditional approaches to keep data storage reliable, such as, data replication, erasure encoding and Redundant Array of Independent Disks (RAID) methods, become liabilities as drive capacity exceeds 6 TB in size. Also, there are significant costs to provision extra storage capacity for these solutions. Additionally, the replication and recovery mechanisms introduce complexity which affects reliability and performance.
For example, most RAID configurations require identically sized drives to support their RAID policy correctly, reducing flexibility to leverage advances in storage capacity per drive. Most importantly, as drive size gets large, when a failed drive must be replaced, the vast size of these new drives requires a tremendous amount of time to “rebuild” the redundancy to restore the system to full protection. Rebuilding time is the time taken to either replicate the lost mirror or parity configuration in the replacement drive so that the system again has the capacity to tolerate a subsequent drive failure. During this rebuild time, the system is vulnerable to another drive failure. Specifically, another failure may cause the loss of all the information in the storage configuration. As various industries deploy drives of massive capacity, this rebuild time can stretch into several weeks or longer, increasing the chances that the valuable data stored on these systems will be lost entirely due to subsequent failures before the rebuild process has completed.
Problems associated with data loss due to drive failure/malfunction is particularly acute for certain kinds of data, for example, bank transactional data, stored data structures, configuration information, event logs etc., where every bit of data matters, because each piece of data may have unique and impactful significance. Any data loss can create a critical situation, deeply compromising system or human stakeholders of the data.
For other kinds of data, specifically certain kinds of streaming data, the information comes repeatedly, potentially timestamped for resequencing later, and has the characteristic of both its preceding and following data. These characteristics create a potential extrapolation of the missing piece. In other words, this type of data is loss-tolerant.
An example of loss-tolerant data is shown in FIG. 1. The data might be a stream of data from a sensor, indicating amount of current on an electrical wire. This sensor issues a reading every few seconds for the purposes of monitoring energy use in an office building. It could be sending current readings. If an appliance is turned on immediately after the (i−1)th time of data collection, sensor reading in di is significantly higher than di−1. If the entire data stream is lost, there is no easy way to know the amount of energy in the wire for time period of the lost data. However, the loss of just di is not horrible because di+1 will contain the higher reading as well. Ideally, it is preferred not to lose any data, but losing a piece of data di is not catastrophic.
Video stream data is a sequence of video frames captured over a window of time, and has this loss-tolerant property depending on the application. For video surveillance applications, there is also a finite lifetime of the video data defined as video retention requirement, which further limits the value of going to excessive lengths to preserve every video frame.
Since video is in the form of multiple images (up to 30 or more frames) per second, the consequence of data loss could be only partial and the perception of this loss is in the form of small fractions of lost video for an overall healthy stream. There are different compression strategies that put higher values on some frames over others, but, instead of going to great lengths to preserve every piece of data, if there is a solution that merely mitigates degradation to gain higher efficiencies and robustness for the overall stream, then there is a potential for improvement over traditional methods. Furthermore, if the solution can support complex video encodings, it may be extended to nearly any loss-tolerant data stream.
Existing Storage Mechanisms
Storage drives are physical and complex devices with moving mechanical and solid-state electronic components. They are also complex systems with embedded controlling software and protocols. In this reality, an individual drive, no matter how well it was manufactured, is likely to fail at some point after it has been put into service. If the data stored on these drives is important, then there is a need to design some reliability through redundancy.
Redundant Array of Independent Disks, abbreviated as RAID, is a method that offers automatic redundancy to counter this problem of individual drive failure. RAID is a data storage virtualization technology that combines multiple physical disk drive components into a single logical unit for the purpose of data redundancy and performance improvement. By placing data on multiple drives, input/output (I/O) operations can overlap in a balanced way, improving performance. A RAID controller is a hardware device or software program or a combination of hardware and software used to manage disk drives in a computer or a storage array so they work as a logical unit.
To keep things simple in a RAIDed storage system, applications write to filesystems a single file and the underlying filesystem or storage subsystem supporting the filesystem takes responsibility for retaining the data reliably even in the context of drive failures. The most common approaches are either replicating the data across multiple drives or using computational methods that can derive any lost information when a drive fails.
Replication is a commonly used scheme to counter the undesirable consequence of drive failure. RAID 1 policy replicates blocks of data being stored on a mirrored set of drives. Hadoop Filesystem replicates at the file level across multiple systems using multiple drives.
Replication policies such as RAID 1, shown in FIG. 2, save a block-for-block copy on each drive. This prompts at least twice as much storage available since the data is getting written twice. It is somewhat efficient as the work can be done in parallel.
When a drive fails, as shown in FIG. 3, the data is retrieved from the surviving drive(s). When a new replacement drive is added to the configuration, the Replication manager must rebuild the new drive to make it an identical copy of the surviving drive to restore the disk group to full protection.
The benefit of this method lies in the performance to write data since the writing can be done in parallel. In some cases, such as HFS, data can be read in parallel, providing greater capacity. The penalties of this method lie in the extra costs of duplicating, triplicating (or making even more copies), the amount of storage required to gain the extra reliability and performance. Moreover, when replacement drives are added, the rebuilding work is the time to copy over all the data from the surviving drive(s) to the new drive.
Compute-based policies address the shortcomings, such as added cost to support replication-based policies. To support a single terabyte of storage in a replication-based policy, the user must purchase 2 terabytes (or more depending on the policy). Compute-based policies like RAID 5, RAID 6 and Erasure Coding techniques allow for fewer than 2× number of drives and still support reliable storage.
In the example of RAID 5, shown in FIG. 4, a single extra drive is used to save parity information for sectors of user data stored on the remaining data drives in a disk group. In this way, a group of 5 drives can be made reliable by adding only a 6th parity drive (as opposed to 5 more drives if replication was used).
A drive is configured as an array of blocks of storage. A disk group builds a collection of drives that all have the same number of blocks of storage (i.e., same size). If there are ‘n’ drives in the disk group, each drive may be labeled as D1, D2, D3, . . . Dn with an extra drive Dn+1 to use for computation.
Each block in a disk is a collection of bits (0 or 1) and if the disks in the disk group are all the same size ‘m’ number of blocks, then each block is addressable as DiBj where ‘i’ corresponds to the disk number in the disk group and ‘j’ corresponds to the block in that disk.
New data getting written in block ‘x’ of drive ‘y’ prompts a computation to determine a value to store in the parity drive:Dn+1Bx=F(D1Bx, D2Bx, . . . , DyBx, . . . , DnBx)
For RAID 5, the operator F( ) is XOR.
Then, if any drive is lost, all blocks within the collection of drives can be reconstituted by using the same formula, as follows:DyBx=F(D1Bx, . . . Dy−1Bx, Dy+1Bx, . . . , DnBx, Dn+1Bx)
Moreover, when a new drive is added back into the configuration, the new drive can be rebuilt using this same calculation across each block of storage in the new drive. This is shown in FIGS. 5 and 6.
The benefits of this technique lies in that there is a much smaller overhead in capital costs in terms of extra storage to provide reliability. The down side is associated with the performance cost in terms of computing and latencies in writing new parity blocks every time an actual data block gets written as well as the cost of computing missing data when a drive fails. As the capacity of disk drives get bigger, rebuild time can take weeks and the data integrity is no longer guaranteed due to the limited un-correctable read error. Finally, as with replication, the replacement drive rebuild costs associated is expensive with this parity calculation needed for every block of storage when a new replacement drive gets added to such a configuration. This rebuilding not only takes time to compute but also takes time to read blocks from all the other drives to compute the proper data to store in each block in the replacement drive.
This technique can be made more reliable by adding extra parity drives while increasing the overhead per byte of usable storage and resulting latencies associated with parity calculations.
All these costs maybe reasonable in situations where absolute data integrity is an inherent requirement. However, for some kinds of data, like surveillance video, these costs maybe too onerous. For example, if you use 10TB drives, the time rebuild a lost drive, data retention periods may already have passed. Because of these advancements in drive capacity, video data presents special challenges and opportunities to design storage solutions that provide good enough reliability in a cost effective way.
Problems in Existing Storage Mechanisms to Handle Video Data
Video data streams are a flow of many frames (images) in sequence over a window of time. Metadata associated with each stream is expressed as image-resolution, frame- rate (measured in. Frames-per-second, i.e., fps), encoding standard (also known as codec), etc. Depending on the requirements, users may choose to adjust any or all of these parameters to improve their systems performance and quality.
Common encoding standards are MJPEG, MPEG4 , H.264, H.265 et cetera. MJPEG is a simple stream of individual frames. An individual high-resolution camera (e.g., 10 Megapixels or more) running at 30-frames-per-second with MJPEG can generate a massive amount of data. Losing one or two frames of data is akin to a section of a film being taken out of the middle of a scene. The scene appears to jump at that point.
Other codecs leverage the fact that only a fraction of each image may vary from the previous image, allowing transmission and storage of only the differences between images with periodic reference frames used to ensure that overall base image quality is maintained. For example, when a frame or two of these kinds of codecs get lost in transmission through interruption in communication of satellite television signal, a viewer may see a somewhat pixelated section of the screen where some subsets of the image (i.e. small rectangles on the screen) show residual images from an earlier scene. Eventually, more data arrives, and the image appears whole.
For video surveillance systems, users may choose to record video data from a plurality of cameras (numbers may vary from a few cameras to several thousand cameras) on a continuous basis. All this video data is only useful if something happens in the scene that is interesting and needs to be preserved and analyzed. Otherwise, it is merely consuming space on disk until its retention window is reached and then it is deleted to make room for the constant stream of new video data being created.
What is important about these systems is that the video is stored long enough to react to specific events, examples of which include:
Investigating a break in or skirmish,
Understanding traffic patterns within a store,
Understanding a quality breakdown in a manufacturing facility.
Different institutions have different requirements as to how long to retain the data. The greater the dependency on greater resolution cameras, higher frame rates, and longer retention requirements, the greater is the need to harness the benefit of new generation of hard drives to save data. Unfortunately, as mentioned earlier, the onerous rebuilding costs associated with these extremely large hard drives makes them a liability when using replication or computational means for making drives reliable. Therefore, for the vast majority of video surveillance data has a fixed lifetime. Most videos are kept for a specified duration, called a retention period. After the retention period, the video data will be deleted. The retention goal ranges from days to years depends on the requirement.
Varying degrees of unused capacity to defend against any one drive failure is costly. Moreover, the performance penalties associated with writing and reading this data can force a degree of overprovisioning as well. However, the final liability is that as the storage capacity of a single drive gets large, the amount of real time required to rebuild replacement drives to return the configuration to a fully defended system is approaching the point of making these systems unreliable if large capacity drives are used.
For large capacity drives, RAID is no longer applicable. The un-correctable read error will cause data integrity and the weeks of rebuild causes system vulnerable of losing all of its data due to drive failures. The performance penalty of erasure coding mechanism makes it not suitable for streaming live videos.
The system design of trying to create a reliable storage subsystem where Video Management System (VMS) applications can write data files arbitrarily to storage and the storage subsystem takes care of it, is approaching practical limits where increasing drive size to accommodate growing space requirements can lead to catastrophic impacts when drives fail. The alternative is to only use smaller drives, forcing multiple, distinct disk groups to avoid large latency impacts or massive windows of replacement drive rebuilding time.