Clustering or classifying text is a challenging problem as it is a fine-grained task, for example, in domains such as finance, where texts may include many domain-specific words, have high frequency of occurrences and term frequency-inverse document frequency (tf-idf) values. Traditional representation of documents, such as bag-of-word (BOW) based on word count or tf-idf has bias on frequent words, and the real semantic of a document could be easily buried. For example, a document may be represented by bag-of-words vector, where each dimension denotes a word feature and value is the frequency or tf-idf. Such representation is sparse and high dimensional in which the dimensionality is equal to the size of vocabulary, and word features are completely independent from one another. Other embedding representation for documents does not capture the semantic of a document accurately and loses much information of the document during the machine learning or training process.