Typically, featuring in text classification problems use automated approaches that generate large numbers of features. The most commonly used is “bag-of-words” or bag of n-grams, in which each feature corresponds to the presence or frequency of a specific word or n-word phrase in the document. Conventional bag-of-words approaches produce sparse feature sets with thousands to millions of dimensions. Large feature spaces require more training data to reduce the risk of over-fitting (which degrades classifier performance on new data) and have reduced interpretability. Because bag-of-words features and other automatically generated features do not employ human input, there are little opportunities to incorporate a user's domain knowledge. This results in high labeling and maintenance costs.