Collection, integration and analysis of large quantities of data are key activities for intelligence analysts, search engines, and other entities that handle volumes of information. Such activities are commonly without adequate automated support. Data signatures represent one tool of automation and can refer to statistically identifiable characteristics of data that can be used to differentiate a specific subset of data from other similar data. Often, data signatures are calculated using the vocabulary of the documents. However, data signatures can sometimes fail to capture relationships between concepts and to differentiate the documents' semantics. For example, consider two documents, one about smoking as a health hazard and one about methods to quit smoking. Even though the documents might be significantly different, data signatures for the two documents can be very similar since the documents may contain many common terms. Accordingly, a need exists for processes, data structures, and apparatuses to represent knowledge that can consider the context and/or task of the user when representing data, obtain information about the data at a semantic level, and allow applications to compare knowledge with one another.