1. The Field of the Invention
The present invention relates to information management. More particularly, embodiments of the invention relate to systems and methods for classifying structured and/or unstructured data for use in assigning service areas and service level objectives to objects in a computer system.
2. The Relevant Technology
The world is slowly and continually moving from being paper-based to being electronic-based. This evolution is apparent in almost every aspect of life, from the workplace, to government institutions, to home life. In each area, old paper-based methods of communication and storage are being replaced by electronic information. Businesses have replaced bulky paper files and expensive storage rooms with electronic files and searchable databases. Tax-payers are encouraged to submit returns electronically rather than in paper form, and email is rapidly becoming a principal form of communication.
There are several reasons for this transition, one of which is the convenience and accessibility of electronic systems. Email, for example, often arrives shortly after sending it, and information submitted electronically can be quickly formatted, processed, and stored without the inconvenience of manually reviewing each submission by hand.
As entities become more dependent on electronic data, the ability to manage electronic data becomes crucial for a variety of different reasons. For example, much of the electronic data maintained by an entity or organization often relates to different aspects of the entity and is often subject to various considerations. Without an effective way to manage the electronic data, it is difficult to apply the appropriate considerations to the data.
Further, often there is a large amount of unstructured data, meaning that the value of the data to the entity is not readily known, nor are the services required to manage the data. For example, an entity may have a file storage system that is regularly backed up, despite the presence of files on the system that have little or no value to the entity. Thus, without an effective way to sort, classify, and maintain the files, the entity pays for unneeded services.
Additionally, there may be data that is subject to certain state and federal regulations based on information stored in the content of the data. Without a method of searching data based on content, certain data or files may not receive the services mandated by the regulations, and the entity may be subject to liability.
Generally, there are a number of factors used to determine how data is handled and which services are needed to properly maintain the data. Some of the factors or considerations commonly used include data security, data backup, data retention, data access control, regulatory compliance, corporate compliance, and the like or any combination thereof.
Because most data systems are unstructured and inadequately classified, it is difficult to ensure that the appropriate services are being applied. In fact, even when one attempts to classify data, decisions on how to manage the data are complicated by limitations based on the organization of the entity, irrespective of the data. For example, any given entity typically has more than one “line of business.” An engineering firm, for example, mainly involved with contract work for the government, often has data that is associated with the actual engineering work being performed. At the same time, the firm may also have data associated with the legal department, human resources, or other administrative aspect of the firm. While some data may belong exclusively to one line of business, other data may be shared between more than one line of business. Some of the data associated with the engineering work, for example, may have legal implications, making it necessary for both lines of business to have access for the data. In other words, a given entity often has various domains of data or different shares of data, which may belong individually to a line of business or may be shared among the various lines of business.
For each line of business, data is often subject to certain requirements that differ from the requirements that apply to data associated with other lines of business. Further, each line of business may have a different way of referring to types of data. Thus, each line of business will likely desire that the data receive a different type of service from an information service, making it difficult to establish a uniform system of classification that will satisfy the demands of each of the lines of business.
Many information management systems known in the current art use a one-dimensional system to determine what levels of service objects receive. Rather than taking into account the realities of current business entities, these systems typically classify objects according to only one service category. This methodology restricts such entities from effectively managing and safeguarding their data. As a result, entities may have too much or too little protection for their data. Thus, there is a need for an information management system that is capable of effectively and efficiently classifying and orchestrating service levels for all the data objects in an entity's system.
Various information management systems and methods exist, some of which address these and other factors. One difficulty in designing an information management system, however, is that the amount of electronic data that can be managed by the information management system is limited by the physical and processing constraints of the hardware implementing the information management system. For instance, an information management server is necessarily limited in the amount of electronic data it can manage by, among other things, the speed and size of its processors and other hardware. Thus, the scalability of an information management system is an important consideration for entities desiring to implement information management in a network.
Two conventional scaling solutions often implemented include scaling up and scaling out. Scaling up, for example, includes implementing the information management system in a server with faster hardware. Often, however, the cost of scaling up can be prohibitively high. Scaling out includes implementing the information management system in redundant servers, with each server managing a subset of a network, or partitioning/separating out low-level information management functions to other servers. In the former case, the use of multiple information management servers can present integration difficulties and in the case of the latter, high-level functions not partitioned to the other servers still require significant computing resources from the information management server.
What are needed, therefore, are improved methods and systems for managing electronic data in a network.