1. The Field of the Invention
The present invention relates to information management. More particularly, embodiments of the invention relate to systems and methods for classifying structured and/or unstructured data in a computer system.
2. The Relevant Technology
The world is slowly and continually moving from being paper-based to being electronic-based. Businesses have replaced bulky paper files and expensive storage rooms with electronic files and searchable databases. Tax-payers are encouraged to submit returns electronically rather than in paper form, and email is rapidly becoming the principal form of communication.
There are several reasons for this transition, one of which is the convenience and accessibility of electronic systems. Email, for example, often arrives shortly after sending it, and information submitted electronically can be quickly formatted, processed, and stored without the inconvenience of manually reviewing each submission by hand. Software programs enable documents, spreadsheets, diagrams, circuits, drawings, etc, to be created, stored, edited, accessed, etc., electronically.
One of the results of the digital nature of data is that most entities today have a large amount of data. New data is being added daily, existing data is often changed, and some data simply ages. And, as entities become more dependent on electronic data, the ability to manage electronic data becomes important for a variety of different reasons. Data security, data backup, data retention, data access control, regulatory compliance, corporate compliance, and the like are examples of why the ability to manage electronic data is important. Further, much of the electronic data maintained by an entity or organization often relates to different aspects of the entity and is often subject to various considerations. Without an effective way to manage the electronic data, it is difficult to apply the appropriate considerations to the data. As a result, providing adequate services in today's data environments is complex.
In addition to these concerns, there is often a large amount of unstructured data, meaning that the value of the data to the entity is not readily known. Consequently, the services required to manage the data are similarly unknown. For example, an entity may have a file storage system that is regularly backed up, despite the presence of files on the system that have little or no value to the entity. Thus, without an effective way to sort, classify, and maintain the data including files, the entity typically pays for unneeded services and/or has data that receives inadequate services.
Because many data systems are inadequately classified, it is difficult to ensure that the appropriate services are being applied. In fact, even when one attempts to classify data, decisions on how to manage the data are complicated by limitations based on the organization of the entity, irrespective of the data. For example, any given entity typically has more than one “line of business.” An engineering firm, for example, mainly involved with contract work for the government, often has data that is associated with the actual engineering work being performed. At the same time, the firm may also have data associated with the legal department, human resources, or other administrative aspect of the firm. While some data may belong exclusively to one line of business, other data may be shared between more than one line of business. Some of the data associated with the engineering work, for example, may have legal implications, making it necessary for both lines of business to have access for the data. In other words, a given entity often has various domains of data or different shares of data, which may belong individually to a line of business or may be shared among the various lines of business.
Currently, information management classification systems perform a crawl or read operation as the classification system discovers and categorizes all the data in the system in order to assign appropriate service levels to each object. One disadvantage of this method, however is that the data reading process is computationally expensive and requires a large amount of processing time as each object is read, reviewed, and assigned to a category. Thus, there is a need for a system and method for classifying data that is more computationally efficient and cost effective.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.