The present invention generally relates to analysis of text received from various sources, and more specifically to detection and validation of concepts extracted from entities embedded in unstructured text.
One of the challenges for text analytics is the identification of terms with ambiguous meaning, known as entity recognition and entity linking. For example, performance of a text analytics system can be negatively impacted by ambiguous terms such as Michael Jordan (basketball player or the famous UC Berkley professor), back (as a body part or a preposition) and US (Ultrasound or United States) that appear in text. Typical entity recognition and entity linking approaches rely on manually labeling text to train the systems for recognizing terms from text and then linking the terms to the right concepts. Such text-based training can be expensive to compile, and updates to the system depend on technical users such as software developers to re-train and enhance the entity recognition and linking systems. Everyday users of these systems ultimately play a passive role in the ecosystem. Particularly in the field of medicine, the users themselves (clinicians) have the most relevant insight into the appropriate use of ambiguous terms and they are not integrated into the development of systems and methods currently. Further, the terms and conventions can also vary widely from application to application, so it is important to allow the system to evolve within the environment where it is used.
There is need to improve the efficiency and relevance of the method and systems associated with text analysis.