Computer systems contain large amounts of data. This data includes personal data, such as financial data, customer/client/patient contact data, audio/visual data, and much more. Corporate computer systems often contain word processing documents, engineering diagrams, spreadsheets, business strategy presentations, and so on. With the proliferation of computer systems and the ease of creating content, the amount of content in an organization has expanded rapidly. Even small offices often have more information stored than any single employee can know about or locate.
Many organizations have installed content management software that actively searches for files within the organization and creates an index of the information available in each file that can be used to search for and retrieve documents based on a topic. Such content management software generally maintains on index of keywords found within the content, such as words in a document.
Creating a content index generally requires access to all of the computer systems within an organization and can put an unexpected load on already burdened systems. Some organizations defer content indexing until off hours, such as early in the morning to reduce the impact to the availability of systems. However, other operations may compete for system resources during off hours. For example, system backups are also generally scheduled for off hours. Systems may be placed in an unavailable state during times when backups are being performed, called the backup window, to prevent data from changing. For organizations with large amounts of data, any interruption, such as that from content indexing, jeopardizes the ability to complete the backup during the backup window.
Furthermore, traditional content indexing only identifies information that is currently available within the organization, and may be insufficient to find all of the data required by an organization. For example, an organization may be asked to produce files that existed during a past time period in response to a legal discovery request. Emails from five years ago or files that have been deleted or are no longer available except in offsite backup tapes may be required to answer such a request. An organization may be obligated to go through the time consuming task of retrieving all of this content and conducting a manual search for content related to the request.
There is a need for a system that overcomes the above problems, as well as providing additional benefits.
In the drawings, the same reference numbers and acronyms identify elements or acts with the same or similar functionality for ease of understanding and convenience. To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the Figure number in which that element is first introduced (e.g., element 1104 is first introduced and discussed with respect to FIG. 11).
The headings provided herein are for convenience only and do not necessarily affect the scope or meaning of the claimed invention.