The present invention relates to natural language-based machine-learning, and includes non-natural language contents as well.
More specifically, the present invention relates to machine-learning technologies for identifying associations between terms or symbols in textual contents.
Conventional approaches in the field mainly focus on statistical methods and mostly with numerical data. Other approaches for machine-learning with textual data usually do not take the context information into consideration due to the difficulties in identifying the context in text contents, and the results of such approaches need to be improved.
Much of the information is contained in text contents, such as text documents or emails and other user-generated contents. Various theoretical and practical attempts have been made to efficiently understand, classify, and determine the amount and relevancy of the information in natural language contents. The existing techniques, including various search engines, spam filter, fraud-detectors, and document classification systems, however, are often not sufficiently accurate in understanding the content and the relationships between concepts that are contained in the text contents, thus often cannot effectively serve the information needs of their users. There is still a need for accurate, efficient, and automated technologies to identify, search, rank, and classify large amounts of natural language contents based on the meaning of the contents, and the amount of information they contain.