The present invention relates to the field of data mining. More particularly, the present invention pertains to a method and system for mining a document containing dirty text.