1. Field
This disclosure is generally related to content analysis. More specifically, this disclosure is related to calculating similarities between semantic entities.
2. Related Art
The proliferation of electronic mails (emails) has greatly impacted people's everyday life, especially their working life. Modern workers spend, on average, one to two hours of their workday on emails: reading, ordering, sorting, and writing. It is very common for an email user to receive tens, even hundreds, of emails every day. Many of the emails carry important information that may need to be retrieved at a later time. However, the cluttered email inbox of a user often makes retrieving such information difficult.
To help email users better organize their email messages, various email applications have provided different solutions. For example, users of Outlook® (registered trademark of Microsoft Corporation of Redmond, Wash.) can apply various rules to incoming emails in order to sort them into different folders. In addition, Outlook® can aggregate email messages into conversations by matching subject lines or senders/recipients. Note that an email conversation is a set of related messages generated by the “reply” operation. Gmail™ (trademark of Google Inc. of Mountain View, Calif.) allows its users to apply labels to messages in order to categorize the messages accordingly. Hence, a user can place all emails related to a task within a single folder, or apply a single label to these emails. As a result, if the user ever needs to retrieve information related to a task, he can go to the corresponding folder or click on the corresponding label. However, these approaches require manual input from the user, which can be cumbersome and time-consuming. In addition, in scenarios where no explicit rule or label can be applied to a message, or where the sender of a message does not use the reply function, the user may find it difficult to retrieve related messages.