1. Technical Field
The present teaching generally relates to organizing, retrieving, presenting, and utilizing information. Specifically, the present teaching relates to methods and systems for associating data from different sources and searching data.
2. Discussion of Technical Background
The Internet has made it possible for a person to electronically access virtually any content at any time and from any location. The Internet technology facilitates information publishing, information sharing, and data exchange in various spaces and among different persons. One problem associated with the rapid growth of the Internet is the so-called “information explosion,” which is the rapid increase in the amount of available information and the effects of this abundance. As the amount of available information grows, the problem of managing the information becomes more difficult, which can lead to information overload. With the explosion of information, it has become more and more important to provide users with information from a public space that is relevant to the individual person and not just information in general.
In addition to the public space such as the Internet, semi-private spaces including social media and data sharing sites have become another important source where people can obtain and share information in their daily lives. The continuous and rapid growth of social media and data sharing sites in the past decade has significantly impacted the lifestyles of many; people spend more and more time on chatting and sharing information with their social connections in the semi-private spaces or use such semi-private sources as additional means for obtaining information and entertainment. Similar to what has happened in the public space, information explosion has also become an issue in the social media space, especially in managing and retrieving information in an efficient and organized manner.
Private space is another data source used frequently in people's everyday lives. For example, personal emails in Yahoo! mail, Gmail, Outlook etc. and personal calendar events are considered as private sources because they are only accessible to a person when she or he logs in using private credentials. Although most information in a person's private space may be relevant to the person, it is organized in a segregated manner. For example, a person's emails may be organized by different email accounts and stored locally in different email applications or remotely at different email servers. As such, to get a full picture of some situation related to, e.g., some event, a person often has to search different private spaces to piece everything together. For example, to check with a friend of the actual arrival time for a dinner, one may have to first check a particular email (in the email space) from the friend indicating the time the friend will arrive, and then go to Contacts (a different private space) to search for the friend's contact information before making a call to the friend to confirm the actual arrival time. This is not convenient.
The segregation of information occurs not only in the private space, but also in the semi-private and public spaces. This has led to another consequential problem given the information explosion: requiring one to constantly look for information across different segregated spaces to piece everything together due to lack of meaningful connections among pieces of information that are related in actuality yet isolated in different segregated spaces.
Efforts have been made to organize the huge amount of available information to assist a person to find the relevant information. Conventional scheme of such effort is application-centric and/or domain-centric. Each application carves out its own subset of information in a manner that is specific to the application and/or specific to a vertical or domain. For example, such attempt is either dedicated to a particular email account (e.g., www.Gmail.com) or specific to an email vertical (e.g., Outlook); a traditional web topical portal allows users to access information in a specific vertical, such as www.IMDB.com in the movies domain and www.ESPN.com in the sports domain. In practice, however, a person often has to go back and forth between different applications, sometimes across different spaces, in order to complete a task because of the segregated and unorganized nature of information existing in various spaces. Moreover, even within a specific vertical, the enormous amount of information makes it tedious and time consuming to find the desired information.
Another line of effort is directed to organizing and providing information in an interest-centric manner. For example, user groups of social media in a semi-private space may be formed by common interests among the group members so that they can share information that is likely to be of interest to each other. Web portals in the public space start to build user profiles for individuals and recommend content based on an individual person's interests, either declared or inferred. The effectiveness of interest-centric information organization and recommendation is highly relied on the accuracy of user profiling. Oftentimes, however, a person may not like to declare her/his interests, whether in a semi-private space or a public space. In that case, the accuracy of user profiling can only be relied on estimation, which can be questionable. Accordingly, neither of the application-centric, domain-centric, and interest-centric ways works well in dealing with the information explosion challenge.
FIG. 1 depicts a traditional scheme of information organization and retrieval in different spaces in a segregated and disorganized manner. A person 102 has to interact with information in private space 104, semi-private space 106, and public space 108 via unrelated and separate means 110, 112, 114, respectively. For accessing private data from the private space 104, means 110, such as email applications, email sites, local or remote Contacts and calendars, etc., has to be selected and used. Each means 110 is domain or application-oriented, allowing the person 102 to access information related to the domain with the specific application that the means 110 is developed for. Even for information residing within different applications/domains in the private space 104, a person 102 still has to go by different means 110 to access content of each application/domain, which is not convenient and not person-centric. For example, in order to find out the phone numbers of attendees of a birthday party, the person 102 has to first find all the confirmation emails from the attendees (may be sent in different emails and even to different email accounts), write down each name, and open different Contacts to look for their phone numbers.
Similarly, for interacting with the semi-private space 106, a person 102 needs to use a variety of means 112, each of which is developed and dedicated for a specific semi-private data source. For example, Facebook desktop application, Facebook mobile app, and Facebook site are all means for accessing information in the person 102's Facebook account. But when the person 102 wants to open any document shared on Dropbox by a Facebook friend, the person 102 has to switch to another means dedicated to Dropbox (a desktop application, a mobile app, or a website). As shown in FIG. 1, information may be transmitted between the private space 104 and the semi-private space 106. For instance, private photos can be uploaded to a social media site for sharing with friends; social media or data sharing sites may send private emails to a person 102's private email account notifying her/him of status updates of social friends. However, such information exchange does not automatically create any linkage between data between the private and semi-private spaces 104, 106. Thus, there is no application that can keep track of such information exchange and establish meaningful connections, much less utilizing the connections to make it easier to search for information.
As to the public space 108, means 114 such as traditional search engines (e.g., www.Google.com) or web portals (e.g., www.CNN.com, www.AOL.com, www.IMDB.com, etc.) are used to access information. With the increasing challenge of information explosion, various efforts have been made to assist a person 102 to efficiently access relevant and on-the-point content from the public space 108. For example, topical portals have been developed that are more domain-oriented as compared to generic content gathering systems such as traditional search engines. Examples include topical portals on finance, sports, news, weather, shopping, music, art, movies, etc. Such topical portals allow the person 102 to access information related to subject matters that these portals are directed to. Vertical search has also been implemented by major search engines to help to limit the search results within a specific domain, such as images, news, or local results. However, even if limiting the search result to a specific domain in the public space 108, there is still an enormous amount of available information, putting much burden on the person 102 to identify desired information.
There is also information flow among the public space 108, the semi-private space 106, and the private space 104. For example, www.FedeEx.com (public space) may send a private email to a person 102's email account (private space) with a tracking number; a person 102 may include URLs of public websites in her/his tweets to followers. However, in reality, it is easy to lose track of related information residing in different spaces. When needed, much effort is needed to dig them out based on memory via separate means 110, 112, 114 across different spaces 104, 106, 108. In today's society, this consumes more and more people's time.
Because information residing in different spaces or even within the same space is organized in a segregated manner and can only be accessed via dedicated means, the identification and presentation of information from different sources (whether from the same or different spaces) cannot be made in a coherent and unified manner. For example, when a person 102 searches for information using a query in different spaces, the results yielded in different search spaces are different. For instance, search result from a conventional search engine directed to the public space 108 is usually a search result page with “blue links,” while a search in the email space based on the same query will certainly look completely different. When the same query is used for search in different social media applications in the semi-private space 106, each application will again likely organize and present the search result in a distinct manner. Such inconsistency affects user experience. Further, related information residing in different sources is retrieved piece meal so that it requires the person 102 to manually connect the dots provide a mental picture of the overall situation.
Therefore, there is a need for improvements over the conventional approaches to organize, retrieve, present, and utilize information.