Embodiments herein generally relate to methods for accessing a dataset and more particularly to a method that accesses customer requests based on semantic link strengths between customers, products, and customer requests.
Link analysis methodologies play key roles in Web search systems. They exploit the fact that the Web link structure conveys the relative importance of Web pages. The HITS methodology (J. Kleinberg, “Authoritative Sources in a Hyperlinked Environment”, Journal of the ACM, Vol. 46, No. 5, pp. 604-622, 1999) relies on query-time processing to reduce the hubs and authorities that exist in a subgraph of the Web consisting of both the results to a query and the local neighborhood of these results.
Google's PageRank (L. Page, S. Brin, et. al. “PageRank Citation Ranking: Bringing Order to the Web”, Technical Report, Stanford University, Stanford Calif., 1998) pre-computes a ranking vector that provides a-priori “importance” estimates for all of the pages on the Web. This vector is computed once, offline and is independent of the search query. At query time, these importance scores are used in conjunction with query-specific IR (Information Retrieval) scores to rank the query results.
There are several enhanced PageRank methodologies being developed recently, such as a weighted PageRank (X. M. Jian, et al “Exploiting PageRank Analysis at Different Block Level”, in Proc. Of Conference of WISE 2004), two-layer PageRank (J. Wu, et al “Using a Layered Markov Model for Decentralized Web Ranking”, EPFL Technical Report IC/2004/70, Aug. 19, 2004), hierarchical PageRank (G. R. Xue, et al “Exploiting Hierarchical Structure for Link Analysis”, the 28th Int. ACM SIGIR Conference, 2005), and the topic-sensitive PageRank (T. H. Haveliwala, “Topic Sensitive PageRank”, In Proc. Of the 11th Int. World Wide Web Conference, May 2002).
The above methods only consider the explicit graph-topological links (either flat or hierarchical networks) residing in a web page, and most of them generate a single page-ranking vector. The linguistic-based topic structure used in T. H. Haveliwala, above, is only used for biasing the ranking scores based on different topics, and it does not provide any additional “semantic-link” structure into the web page. Although multiple ranking-vectors can be computed by T. H. Haveliwala, these ranking vectors are still for web pages with biasing by different topics. None of the existing ranking methodologies are sufficient to effectively handle the prioritization of customer requests. This is because the analysis of customer requests is a very domain-driven problem, and the link-based relationships embedded in them are well beyond the explicit hyperlinks and involve much more complex inter-related networks (which contain both hyperlinks and semantic links).
In order to address issues related to accessing a dataset of customer requests, disclosed herein is a semantic based ranking methodology that identifies important feature requests submitted by customers. The methodology followed is akin to Internet page rank methodologies used by Internet search engines. With such Internet page rank methodologies, an important page is one which is linked to by other pages that are ranked as important.
In this disclosure, customer requests can comprise, for example, three central components: the textual request (text of the request itself), the identification of the customer who made or was associated with the request, and the product or products which are the subject of the request. Using semantic indexing and domain knowledge, links within these three categories are created and strengthened based on semantic similarity. The association between products and requests can be represented as a matrix. From this matrix, two rank scores are generated, one for customer requests and a second for products. These two scores reinforce each other and, by iterating through their generation, they converge. The results are that the largest values in the resulting matrix the pages with the highest ranking.
One exemplary method embodiment, pre-processes customer requests that are maintained in a dataset to create a matrix between products and the customer requests. Again, each of the customer requests comprises at least a customer identification, a textual request, and a product identification related to the textual request. After such pre-processing of the dataset, the method can respond to queries of the dataset using the matrix.
The pre-processing of the customer requests can include many steps. For example, the preprocessing can identify explicit links between customers, textual requests, and products maintained in the dataset. The explicit links are based on the customer, the textual request, and the product identified by each separate customer request.
In addition, the pre-processing can identify implicit links between the customers, the textual requests, and the products maintained in the dataset, based on semantic similarities. More specifically, the semantic similarities can be based on previously established relationships between terms or phrases, market segments, product family categories, business strategies, etc.
Thus, the semantic similarities can be wording similarities that are based on wording classifications and/or text mining. Alternatively, the semantic similarities can be similarities of products based on a hierarchical product family structure, or can be similarities of customers based on market segments.
The pre-processing can rank the importance of the customers, the textual requests, and the products based on the strengths of the explicit links and the implicit links. Therefore, high ranking customers would be identified as important customers, high ranking textual requests would be identified as important textual requests, and high ranking products would be identified as important products. These importance rankings of the customers, the textual requests, and the products are recorded in the matrix. Further, the ranking process awards more value for links to and from the important customers, important textual requests, and important products.
Because the pre-processing ranks the customers, products, and textual requests according to their explicit and implicit links, the process of responding to the queries can display the matches to a query in an order that is based on their importance rankings. To aid in this process, the queries can be transformed into semantic queries.
These and other features are described in, or are apparent from, the following detailed description.