1. Field of the Disclosure
The present disclosure relates to a system and a method for identifying and ranking trending named entities. The present disclosure further relates to such a system and a method for identifying and ranking trending named entities in digital content objects.
2. Description of the Related Art
Social media impact is measured by how much the story is trending, for instance, by counting the numbers of shares, tweets and other engagements that a digital content object, for example a news story, has attracted over a given period of time. For Facebook®, these engagements can mean a share, like or comment; for Twitter®, a tweet or retweet of a link; and for LinkedIn®, a share of the content. Other social network platforms use similar indicia by which user engagement with a digital content object can be registered and tracked.
Tracking and measuring social media user engagements alone does not reveal anything about the content of the digital object itself, other than the degree to which it is trending among users of the platform. In other words, ranking of digital objects is content agnostic. As such, tracking and measuring social network engagements alone does not identify which entities in the content of the digital object content—for example a particular celebrity or political figure in a news story—are trending.
Natural language processing (NLP) is a field of computer science, artificial intelligence (AI), and computational linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP has traditionally been related to the area of direct human-computer interaction, for example in Interactive Voice Response systems or text-to-speech recognition by a computer. One AI data analysis approach is based on identifying named entities from the natural language elements of digital data. Named entities are persons, organizations, locations or other text elements that can be located and classified into pre-defined categories. Named-entity recognition (NER), also known as entity identification and entity extraction, is an AI task that seeks to locate these text elements in a stream of text and classify them.
An NER system applies a model to a set of training data using machine learning techniques. Namely, before first use on unknown digital data, the NER system learns how to operate by applying itself to a large amount of manually annotated training data. Then, the NER system can be employed to extract named entities from longer text items.
The prior art discloses a method for analyzing a search engine query to determine those named entities that are the most suitable for input into a search engine to satisfy the query. The ranking model ranks the named entities based on relevance between the query features and corresponding entity features of each named entity. The ranking model includes user context in the ranking, e.g. taking account of user location and a time line of events that link the content of the query to features of the named entities that are mentioned in the query. However, if user-context based ranking does not work for some reason, the ranking model defaults to a popularity ranking of the named entities based on search history of the general population.
Other prior art discloses a ranking scheme that runs inside a messaging application. When a conversation takes place in the messaging application, the text is analyzed to extract named entities that are mentioned in the conversation. The named entities are then ranked according to their frequency in the conversation, specifically how often a particular named entity is mentioned in the conversation divided by the total number of mentions of all named entities in the conversation.