It is sometimes required to update content stored in a storage device. For example, if the content is software, or a program (such as an executable file), it is sometimes required to fix a bug existing therein or introduce new features thereto. Yet, the latter example in non-limiting and other types of content may also require updates, such as text, data stored in a database, etc. Hereinafter the terms “old version” or “original version” refer to a version of content before update, and the terms “new version” or “updated version” refer to a version that includes already updated content. In other words, an original version includes “original content” while an updated version includes “updated content”. It should be noted that updated content can be further updated. In case of a second update, for example, the updated content of the first update turns to be original content of the second update while new updated content is generated by the second update etc.
A process that updates original content yielding updated content is referred to as an “update process”. The update process usually requires instructions, instructing it how to perform the update. Such instructions provided to the update process constitute together an “update package”, wherein each instruction included therein constitutes an “update command”. That is, an update process obtains an update package as input, and operates in accordance therewith in order to update the original content to updated content. This is non-limiting though and sometimes an update process can obtain more than one update package allowing it, together, to update the content. Alternatively, instead of obtaining an update package, the update process can sometimes retrieve an update package (or a set of update commands) from a storage device or from a database etc. Hence, hereinafter, when referring to the term “obtaining an update package” it should be appreciated that the update process can passively obtain the package, it can actively retrieve the package or sometimes it can activate a package embedded therein (e.g., a hard coded set of update commands).
One way to update an original version to an updated version is storing the updated version in the storage device in addition to the original version. For example, a computer program “prog.exe” is activated whenever a user presses a certain icon on the PC (Personal Computer) windows desktop. In order to update prog.exe it is possible to store the updated version of this file in a different location than the present (original) version, and then reset the path associated with the icon so as to activate the updated version instead of the original version. Later, when it is ascertained that the update process completed successfully, the original version can be deleted safely, releasing the space occupied thereby. In addition, this latter update method requires that the complete updated version be provided to the update process, e.g., in the update package. Such an update package easily becomes huge in size, and if it is required to transmit it to the updated device via band-width limited communication channels, transmittance may become cumbersome and sometimes even impossible. Therefore, it is preferable that the size of the update package be reduced.
Another update method can simply overwrite original content with updated content. This update method is risky and non-reliable, because if the update process fails in the middle of operating, when part of the original version is already overwritten, while only part of the updated version is written to the storage device, it is appreciated that the version stored on the storage device at the time of interruption is probably invalid or inoperable. In addition, the requirement to transmit the complete updated version is not yet solved with this method. Yet, it is noted that updating content by overwriting the original content with the updated content is commonly referred to in the art as “in-place update”. Hereinafter, unless specifically noted, the term “update” is used to describe “in-place update”.
One way for reducing the size of an update package is by including in it information representing the differences between the original and updated content. Such an update package is sometimes referred to also as a “difference”, a “difference result” or a “delta”. The update process, upon operating in accordance with a delta, applies it to the original content, hence producing the updated content.
The size of the delta being considered, there are methods trying to reduce the size thereof. For example, U.S. Pat. No. 6,546,552 (“Difference extraction between two versions of data-tables containing intra-references”, published 2003) discloses a method for generating a compact difference result between an old program and a new program. Each program includes reference entries that contain references that refer to other entries in the program. According to the method of U.S. Pat. No. 6,546,552, the old program is scanned and for each reference entry, the reference is replaced by a distinct label mark, whereby a modified old program is generated. In addition, according to U.S. Pat. No. 6,546,552, the new program is scanned and for each reference entry the reference is replaced by a distinct label mark, whereby a modified new program is generated. Thus, utilizing directly or indirectly the modified old program and modified new program, the difference result is generated.
WO 2004/114130 (“Method and system for updating versions of content stored in a storage device”, published 2004) discloses a system and method for generating a compact update package between an old version of content and a new version of content. The system of WO 2004/114130 includes a conversion element generator for generating a conversion element associated with the old version and new version. It also includes a modified version generator for generating a modified version, and an update package generator for generating the compact update package. The compact update package includes the conversion element and a modified delta based on the modified version and the new version.
WO 2005/003963 (“Method and system for updating versions of content stored in a storage device”, published 2005) discloses a system and method for updating versions of content stored in a storage device. The system of WO 2005/003963 includes an update module for obtaining a conversion element and a small delta. It also includes a converted old items generator for generating converted old items by applying the conversion element to items of an old version, a data entries generator for generating data entries based on the modified data entries and on the converted old item, and a new version generator for generating a new version of content by applying the commands and the data entries to the old version.
It was noted before that an update package is sometimes referred to as a delta, however, this is non-limiting, and as it appears from WO 2004/114130 and WO 2005/003963, the update package sometimes includes a delta therewith.
Other methods exist in the art, but before they are mentioned, several considerations should better be discussed. For example, it is appreciated that content is normally stored in a storage device. A storage device can be a volatile storage device (such as Random Access Memory, RAM) or a non-volatile storage device (such as a hard disk or flash memory).
There are storage devices that are organized in discrete areas, referred to, e.g., as blocks or sectors, wherein one block can include content belonging to more than one file. Hence, if there are, for example, two files stored in a storage device, a single block can include several (‘x’) bytes belonging to a first of the two files, as well as several (‘y’) bytes belonging to a second of the two files. If the size of a block is ‘z’ bytes, it is clear that z>=x+y. Yet, those versed in the art would appreciate that writing content into a block affects other content stored therein. That is, if it is required to re-write the content stored in the x bytes of the first file (e.g., during update thereof), due to storage device limitations it may be impossible to write only those x bytes, and it may be necessary to write the content of all the z bytes to the storage device. This can be done, for example, by reading content stored in the z bytes from the non-volatile storage device to a volatile storage device not including blocks, such as RAM, updating only the content stored in the x bytes in the volatile storage device (that is, the content of the other z-x bytes is left unaffected therein) and then writing the content of the z bytes back to the non-volatile storage device. This limitation characterizes flash memory devices, for example, wherein it is required to completely delete the present content of a block, before new content (including updated content) can be written thereto, and hard disks where it is not obligatory to delete the complete sector before writing data thereto, but it is required to write the complete content of a block in one writing operation (e.g., it is impossible to write only x bytes when leaving the content stored in the z-x bytes unaffected; In order to leave the z-x bytes unaffected, it is required to store the content thereof in the volatile memory device and write them back into the block, together with the x bytes). Hence, the update procedure may require many write operations to the storage device including blocks, and it is appreciated that in order to achieve an efficient update, the update should better be optimized. For example, if x equals, for example, two bytes, than these two bytes should better be updated together, instead of updating the first byte and then the second byte, writing these two bytes separately into the block.
Furthermore, when updating an original version (including original content) to an updated version (including updated content), there are sometimes update commands that use original content in order to generate updated content. For example, it is possible to copy original content to a different place in the storage device, wherein this copied content, in its destination place, forms part of the updated version. When copying content to a destination place it should be appreciated that this destination place could have been used before for storing other content (possibly also being part of the original version). Hence, the copied content can overwrite other original content. Still further, it is possible that there is another update command that uses the other original content in order to generate updated content. If this other update command is called further to operating in accordance with the first update command, the other original content can be already overwritten. This situation constitutes a “write before read conflict”.
Write before read conflicts are a known problem in the art and U.S. Pat. No. 6,018,747 tries to cope therewith.
U.S. Pat. No. 6,018,747 (“Method for generating and reconstructing in-place delta files”, published 2000) discloses a method, apparatus, and article of manufacture for generating, transmitting, replicating, and rebuilding in-place reconstructible software updates to a file from a source computer to a target computer. U.S. Pat. No. 6,018,747 stores the first version of the file and the updates to the first version of the file in the memory of the source computer. The first version is also stored in the memory of the target computer. The updates are then transmitted from the memory of the source computer to the memory of the target computer. These updates are used at the target computer to build the second version of the file in-place.
According to U.S. Pat. No. 6,018,747, when a delta file attempts to read from a memory offset that has already been written, this will result in an incorrect reconstruction since the prior version data has been overwritten. This is termed a write before read conflict. U.S. Pat. No. 6,018,747 teaches how to post-process a delta file in order to create a delta file, minimize the number of write before read conflicts, and then replace copy commands with add commands to eliminate conflicts. A digraph is generated, for representing the write before read conflicts between copy commands. A schedule is generated that eliminates write before read conflicts by converting this digraph into an acyclic digraph. Yet, U.S. Pat. No. 6,018,747 uses the delta file in order to backup, or protect, content overwritten during write before read conflicts. Hence, the delta file is enlarged.
Another known problem in the art occurs when a process of updating an old version is interrupted before its normal termination, such as in a power failure. In such a case, there is a possibility that the content of the block which was updated during the interruption may become corrupted and contain unexpected content.
It was already mentioned before that when updating blocks of content, an original content of a block sometimes forms part of the input used by the update process. In such a case, if the original block (which is corrupted due to interruption) is required, the update process may be unable to resume. It can be impossible to re-update the corrupted block.
U.S. Pat. No. 6,832,373 (“System and method for updating and distributing information”, published 2004), for example, tries coping with the problem. It discloses devices, systems and methods for updating digital information sequences that are comprised by software, devices, and data. In addition, these digital information sequences may be stored and used in various forms, including, but not limited to files, memory locations, and/or embedded storage locations. Furthermore, the devices, systems, and methods described in U.S. Pat. No. 6,832,373 provide a developer skilled in the art with an ability to generate update information as needed and, additionally, allow users to proceed through a simplified update path, which is not error-prone, and according to U.S. Pat. No. 6,832,373's inventors, may be performed more quickly than through the use of technologies existing when U.S. Pat. No. 6,832,373 was filed.
That is, U.S. Pat. No. 6,832,373 describes using an auxiliary backup block, while all block update operations are performed thereby using two phases ‘two-phase protocol’ or ‘two-phase commit’. According to U.S. Pat. No. 6,832,373, in a first phase of updating a block, the update process writes the updated content to the auxiliary backup block and verifies that the content is correctly stored. In a second phase, the update process writes the updated content into its target block to form the updated content of the updated block. Yet, variations of the same method exist, such as copying the original content of the updated block into the auxiliary backup block in the first phase, and in the second phase updating the target block to store the updated content.
Yet, the two phase commit (whether the backed up content is the original content or the updated content) is time consuming, since every write operation requires performing two operations (for the two phases). In addition, according to U.S. Pat. No. 6,832,373 every backup operation backups the complete (original or updated) content of a block in the auxiliary backup block, and hence if the number of blocks updated by the update process is n, the total number of operations required for the update process (including update operations and write operations into the auxiliary backup block) can not be smaller than 2n. If there are blocks into which content is written in more than one write operation, the number of operations that the update process is required to perform will be even larger than 2n.
There is a need in the art, thus, for a reliable and efficient mechanism for in-place updating original content of an original version, generating an updated version, where the original version and/or the updated version are stored compressed on a storage device.