The present invention relates to the field of content management and, more particularly, to inferring sensitive information from tags.
With the popularization of collaborative environments, mediums such as web logs (e.g., blogs) and discussion forums have become increasingly important to businesses. Employees are progressively participating more in internal and external discussions in these environments. In many instances, these environments are a central point where employees/customers discuss business products and/or services. For example, developerWorks® is a Web site encouraging employees to blog about International Business Machines Corporation's (IBM) products, development, and design issues. While these blogs provide valuable open discussions, feedback from customers, and good public relations, potentially sensitive internal information can be inadvertently disclosed to unauthorized persons.
Some of these environments typically exist within an extranet such as the Internet and therefore are publically accessible. Others are privately accessible, typically residing within an intranet commonly used for internal purposes. Within publically accessible environments, employees must take care not to accidentally disclose sensitive information. Conversely, in privately accessible environments, employees can discuss any topic in great detail without disclosing sensitive information to unauthorized personnel. To facilitate discussions, these environments are typically integrated with software, such as publishing tools, Web sites, collaboration tools, and text exchange. This can result in public and private environments being easily and transparently accessible. Thus, these two disparate environments, each having different user security expectations, can be indifferentiable to even a technically adept user. With these two differing levels of security tightly integrated, the danger of sensitive information being disclosed increases drastically.
Additionally, due to the dynamic nature of business, employees can have difficulty determining potentially sensitive/confidential information. For instance, information about a project can be considered confidential until after release, whereas information about another project can remain confidential indefinitely. Further, if sensitive information is accidentally divulged, competitors can utilize the information to gain unwanted advantages. As collaborative environments continue to proliferate, mechanisms for rapidly identifying and protecting sensitive information within these environments become paramount.