1. The Field of the Invention
The present invention relates to information management. More particularly, embodiments of the invention relate to systems and methods for automated discovery of environment objects operating in a network including the acquisition of knowledge by discovering and applying classification techniques to the environment objects.
2. The Relevant Technology
The world is slowly and continually moving from being paper-based to being electronic-based and this is becoming apparent in a wide variety of different systems. Businesses, schools, and even home life are transitioning to electronic systems. For example, email is becoming a primary means of communication rather than sending regular mail. Bills are paid online. Airlines often prefer electronic ticketing and online check-in. The list goes on. There are several reasons for this transition, one of which is the convenience and accessibility of electronic systems. Email, for example, often arrives shortly after sending it.
As entities become more centered on electronic data, the ability to manage the electronic data becomes crucial for a wide variety of different reasons. Much of the electronic data maintained by an entity or organization often relates to different aspects of the entity and often is subject to various considerations. For example, much of the data of an entity may be dependent on the entity's current business. Data related to a new and upcoming product, for example, may be business critical that should be safeguarded in various ways. At the same time, the entity may have older data about a product being phased out that is no longer subject to the same safeguards.
More generally, there are a number of different factors that may determine how certain data is handled or that determine the services that are needed for the data. Some of the factors or considerations include data security, data backup, data retention, data access control, regulatory compliance, corporate compliance, and the like or any combination thereof. Further, much of the data is unstructured at least in the sense that the data's value to the entity is not readily known and thus the services required for the data is not necessarily known.
For example, an entity may have a file storage system that it backs up on a regular basis. However, there may be many files on the file storage system that have little or no value to the entity. As a result, the entity is often paying for services that are not required. Perhaps more importantly, there may be content in the file storage system that is not receiving sufficient service.
In other words, one of the problems faced by users is related to being able to better identify services that are required. As discussed above, current users often have too much or to little services. In the latter case, companies are at risk, for example, if they do not apply retention and protection to all the files that need it, like files with personal information about employees. But not all such files are recognized as such and are not getting the right services.
As a result, there is clearly a need in the industry to enable an entity or user to properly identify and seek the right service levels for the entity's data. At the same time, there is also a need to be able to provide data classification and data reporting, even if existing service levels are not changed. The ability to simply classify data would enable entities to better evaluate the value of their data.
For example, the unstructured nature of most systems often makes it difficult to ensure that the proper services are sought. However, making decisions on how to manage the data of an entity is often further complicated by the organization of the entity irrespective of the data. For example, any given entity typically has more than one “line of business.” An engineering firm that performs contract work for the government, for instance, often has data that is associated with the engineering being performed. At the same time, the engineering firm may also have data that is associated with the legal department or corporate aspect of the engineering firm, data that is associated with human resources, and the like. In other words, a given entity often has various domains of data or different shares of data, some of which may be shared by the various lines of business.
In each line of business, there is often data that may be subject to certain requirements that are different from requirements that exist with respect to data in the other lines of business. Further, each line of business may have a different way of referring to types of data. All of these differences combine to make providing information management a complex and difficult process.
In today's world, entities are faced with questions such as identifying the levels of security or retention that apply to various files or needing to know which data is critical to the business. Entities must also account for the effects of time on certain data. Data that is associated with a cancelled project, for example, may no longer require certain services. In addition, entities would like to be able to better value their existing data.
One of the failures of conventional systems is related to their failure to understand and account for their network. A network, for example, may want to secure multiple levels of storage. If the service provider is unaware of such storage on a network, however, then it may be unable to provide the needed service. Further, it is not practical to manually assign objects to the various service levels because of the sheer number of objects that are typically present in a network. This is one of the reasons that most systems either provide too much or too little protection because of their inability to understand the environment in an automated fashion that can account for differences between the network's objects that often have an impact of the service levels required.
In sum, the data of an entity is an important asset and should be properly safeguarded. This means that services such as back, retention, encryption, etc., need to be obtained and orchestrated such that entities have neither too little or too much services for their data. As indicated above, conventional systems do not enable entities to effectively manage their data. As a result, these entities either have too much or too little protection for their data. Entities need a way to manage their data so as to comply with all relevant requirements without purchasing too many services and without providing insufficient services. Entities also need a way to manage their data in an ongoing manner as conditions in the entity change.