Many organizations, corporate entities and private individuals make use of automated backup systems to backup and store data. Such automated backup systems are usually offered by third party providers as periodic backup services either for free or, a one time service charge, or sometimes a periodic fee charged weekly, monthly or yearly. Data stored in the backup systems can help in restoring individual files, folders, or even the entire system in the event the original copy of the data becomes corrupted, lost, unreliable, or is unavailable for any other reason. Such events can be the result of intentional malicious hacking attacks by hackers, or, even unforeseen system crashes. In many scenarios, data stored in backup systems is shared (e.g., software code) and reused by developers and technical persons who utilize backup systems to access a current version or a previous version of the stored data.
For performing backups, most conventional backup systems retrieve a current working version of the data and store the data in its entirety. In other words, if a backup system performs an hourly backup of a website, then the backup system stores a complete working version of the website every hour. However, this approach is disadvantageous because it can easily lead to large amounts of data being stored in the backup systems, and thus necessitates continuously increasing backup storage space.
In several scenarios, version control software (also called revision control systems) such as GIT, APACHE SUBVERSION, other open source software programs are used for performing backups. These software programs allow management and systematic tracking of changes (revisions) to data including documents, source code and other information stored as computer files. Changes are usually identified by a number and/or letter code combination, wherein such a combination is termed as a “revision number”, “revision level”, “commit id”, or simply “revision”. For example, an original first version of file a might be called “revision 1”. After a first set of changes are made and saved, the resulting version might be termed as “revision 2”, and so on. Each revision is generally associated with a timestamp and a username of the person making the change. Version control software can be used to identify differences between two versions, restore a present version to a previously stored version, and with some types of files, two versions can even be merged.
Notwithstanding the aforementioned benefits, most version control software programs cannot perform manipulations and comparisons on metadata associated with the user's data. Thus, version control software programs perform a complete backup of the end user's data, every time a change in user's data is identified. Further, even most data storage providers (third party storage providers) who merely offer storage space also perform a complete backup of the end user's data, every time a backup is performed. Furthermore, most third party storage providers (such as website hosting providers) utilize graphical user interfaces and sophisticated automation tools designed to allow end users with several options and features associated with hosting a website. However, such graphical user interfaces and automation tools can be quite complicated and even cumbersome, and usually differ from one third party storage provider to another. Additionally, most website owners use publishing platforms such as WORDPRESS™ to create blogs and websites. Such platforms periodically release software updates, and it is typically the expectation of third party storage providers or website storage providers that website owners install updates when they are released, or deal with the consequences otherwise—which unfortunately can be somewhat dire for small and medium-sized businesses, e.g., once a website site has been hacked, traffic drops and revenue drops.
Therefore, there is a long-felt but unresolved need for a streamlined system or method that allows in automated retrieval and storage of online digital content, wherein such content is owned by users (businesses and individuals) and is stored on third party storage providers. If the user's data relates to website hosting data, then the retrieval process provides the ability to rollback to a previous version of the website, whenever needed. In order to make optimum use of storage space, a preferred retrieval process saves data only when a change is detected in the user's data. Further, the system should notify webmasters, system administrators, website owners, and/or other relevant persons when changes are detected in the retrieved data. An ideal retrieval process should not consume too much time, and should be able to be performed easily by individuals with minimal technical skills, and further can be repeated more than once, as necessary. Also, the system should allow users to setup multiple user accounts to retrieve and backup data relating to multiple websites, wherein the data can be stored at the same or different third party storage providers. Additionally, the system performing the retrieval should periodically ascertain the “health” and reliability of the retrieval data, e.g., whether the data has been exposed to phishing or malware activities. In the event a hack or an unauthorized change is detected, webmasters can quickly revert to the last known “good” version and have their site working in minutes without engaging a developers to remediate the issue.