1. Technical Field
The invention relates to electronic access to information. More particularly, the invention relates to a method and apparatus for identifying, extracting, capturing, and leveraging expertise and knowledge.
2. Description of the Prior Art
For years enterprises have struggled with ineffective search techniques. Compared to what is available on the public Web via services such as Google and Yahoo, there remains a dearth of highly relevant search solutions for content within the enterprise. With dozens to hundreds of independent application and repository information silos in each company, finding critical business information costs business hundreds of billions of dollars each year [source: A. T. Kearney]. Today, CIOs and business executives are revisiting enterprise search as one of the top business/IT challenges for the next few years.
Enterprise search needs are poorly served. Various search technologies have been developed to attack the challenge of searching the Web, searching individual user's computers (PC/desktop), and searching the internal business documents (enterprise). Each of these approaches are unique, but none provide an adequate solution for the enterprise.
PC (Desktop) Search
PC or desktop search can be compared with finding stuff in your messy garage. You know you have it somewhere but just cannot find it. So to locate is the only goal. And when you do find it, you are the sole judge to decide if you have indeed found the right content or document because you collected or wrote the content in the first place. You are the only expert and authority that matters.
Traditional PC search from Microsoft is based on parsing a file at the time of search. It is slow and can only find things in isolated places, such as file folders or email directories. The latest PC search introduces inverted index technology from Google, soon to be available also from Yahoo, Ask Jeeves, and Microsoft. They start to solve the speed and silo problems so that users can find information across personal file systems, Outlook or email systems, calendars, and other desktop environment.
Web Search
The other spectrum of the search is Web search. There, the story is more like driving in Boston for the first time. You are not necessarily the expert of the topics you are looking for and you are learning a new subject. Sometimes, you search to find new services such as weather, travel, or shopping. With Web search, you are counting on millions of people on the Web to help you and you do not necessarily know or care who is the real expert or authority. As a result, you sometimes get bad advice or may shop in the wrong places.
Web search before Google relied only on technologies such as inverted indexes, natural language processing (NLP), and database indexes. They were OK but not as good as it could be if counted the number of links that point at a page. As more sites link to your page, your page becomes more important, simply because webmasters behind the sites have gone through the trouble of adding those extra links to your page. Hence, the birth of page-ranking[tm] and the success of Google's business.
Enterprise Search
The enterprise, however, does not behave as the PC or the Web environment. Imagine you are looking for books to learn Java programming—you know your ultimate goal but there are hundreds of books about Java, which one should I read: it has to be exactly right. So a discovery process finds the right reference content, or information known by other experts in the company. The ultimate judge of good search results for an enterprise extends beyond just yourself. These arbiters of good results could be your peers or the experts that you depend on to do your job.
An example of the problem with enterprise search is shown in FIG. 1, which is a flow diagram showing the state of the art in enterprise search. In the example of FIG. 1, Dave is searching for particular information and retrieves 2,800 documents. There is no useful result that Dave found in the top ten results returned so, Dave calls Sam. Sam, in turn, searches and, finding nothing, e-mails marketing. Mark and Tina in marketing search and find nothing as well. Mark calls Eric, Nancy, and Ganesh and the answer is found in Ganesh's design document. Tina calls Eric, Nancy, and Ganesh again and everybody is now upset. Clearly, it would have been more useful for Dave if he had found Ganesh's design document in his initial search. In fact, the document may have been there but among the 2,800 documents located, but it was not possible for Dave to identify the most useful document.
Traditional enterprise search technology uses inverted index, NLP, and database index approaches (see FIG. 2). The major problem is that the current engine throws hundreds to thousands of search results per query back to the user. Anything that looks like Java or programming, is all mixed together for you to see. Much like email spam, search engines spam the user with numerous, out-of-date, irrelevant, unofficial, siloed, contradictory, and unauthorized results. Users give up quickly and resort to much more expensive ways to get the information including calling, emailing, chafting, or worse, starting to recreate, make up, or give up on the information that already exists.
Enterprise Search Exhibits a Unique Set of Characteristics
By comparing the key issues in enterprise search with that of Web or PC search, it can be concluded that enterprise search is unique and in direct contrast to Web search. In fact, what works for Web search does not and will not work for enterprise search, and vice versa. Five key attributes are considered in this regard: search guide, user behavior, freshness and credibility of the content, user homogeneity, and privacy concerns.
Primary Guide
On the Web, for example, Google's success has depended on page ranking as the primary guide. While page ranking has been effective to provide some sanity in the Web, the same effect will not happen for enterprise content search. Firstly, enterprise content lacks the large number of links needed to provide the page ranking guiding effect, nor are there incentives for enterprises to create these links on a sustainable basis. Secondly, the real goal of page ranking is to find the traces of human effort to indicate subject authority indirectly because it is next to impossible to find the real experts in the vast universe of the Web. For enterprises, you should not need to guess indirectly who might be the experts, you know who the trusted experts are, you hire them, and they work day in and day out in the company as specialists in their domain areas. Enterprise search should rely on them as subject authorities for relevant guidance and ranking.
User Behavior
User behavior is completely different between the enterprises and the Web. We as individuals on the Web have more faces than we might know. We could be men, fathers, sons, husbands, brothers, golfers, travelers, rock musicians, investors, and hundreds of other profiles all at the same time. When we search on the Web, the search tends to be one-off and all over the place. Also, the keywords we type in tend to be the search goals themselves. When we type in “weather,” we are looking for weather information. User feedback on the Web is not reliable because only a very small group of loud users have the time to give feedback and therefore skews the search results with their non-representative bias (how does this last sentence connects to the rest of the paragraph? Perhaps build a short paragraph that explains the bias in user feedback).
Enterprise search, however, tends to repeat itself quickly based on the user's role and the situations he is in. When one sales person is looking for some sales collateral, other sales people responsible for the same products in the same region are very likely in need of the same information. Equally important is the fact that this person who may have 300 roles and profiles in their personal life, has a much smaller number of work roles, e.g. a half dozen at most. He might be an engineer, working in the Paris office while he is a member of the cross-functional cultural committee. It is also important to note that the keywords in the enterprise searches are more like hints, even fishing bait, to documents a person is looking for. It is thought that eighty percent of people seek information they have seen before. Given the enterprise user predictability, we can safely rely on self-motivated actions and behaviors to collect unbiased feedback.
Freshness and Credibility
Web search rewards or ranks older content higher. The longer the content has been sitting there, the more likely it will be found because it has time for others to discover and link to this piece of content.
Enterprises want to behave differently. Fresh content reflects new business situations and, therefore, must be ranked higher so that more people see it. By responding to fresh content quickly, business agility is assured. A piece of content that is one week old may be better than one that is a year old, except that it is not good at all if today's content is available and shows something different than the one week old content. Enterprise search users do not want good enough content, they require the search result to be exactly right.
Homogeneity
The Web or consumer world is very heterogeneous, while an enterprise is the opposite: homogeneous, or more precisely, segmented homogeneous, meaning that different departments or groups (sales vs. marketing vs. engineering) in a company might be different (segmented), but within a group, people are very similar or homogeneous in the way they work regardless how different their profiles are.
The implication of this splitting attributes is profound. In a large heterogeneous world with millions of people involved, statistics is the only known technique to approach the problem in the effort of understanding what people like, want, etc. Web search relies on statistics correctly to find not-so-precise information for the users. The enterprise again is different. With small sample populations and homogeneous groups, statistics do not work. To understand them, you need to know their likes and dislikes. No predictions (what do we mean by ‘predictions’?), just awareness.
With this understanding of enterprise characteristics, it is seen that enterprise search needs to focus on subject authorities, repeated role-based work patterns, fresh and official content, and group know-how (a group's collective knowledge and expertise to do a job). Re-examining traditional IR-based (information retrieval) search, we realize that it focuses on the opposite. It relies on the whole content population (crawl and index it) instead of subject authorities, word or linguistic patterns instead of work patterns, older existing content instead of fresh or official content, statistical trends to predict instead of group similarity to know. There is thus a need for techniques that focus on the correct key characteristics of the enterprises.
The problem with enterprise search technology has become acute to many CIO's and business executives. In the inventor's own limited surveys of a dozen CIOs and business executives, people ranked the enterprise search priority problem as a 9-10 out of 10. The challenge of traditional full-text engines is poor relevancy. They are good for everything (all content) and good for nothing (irrelevant results) at the same time. The NLP technology achieves better relevancy by focusing on one application and one domain where human language becomes more deterministic. The problem with the NLP is that the solution is placed in a silo and good only within that specific application, while enterprises are operating on hundreds to thousands of applications. It is not possible for employees to log on to these many systems one by one to look for information. Both classes of solutions also suffer from the inability to adapt to changes once deployed. Taxonomies and structures change quickly over time in enterprises.
Current search software also suffers from traditional enterprise model with inherited expensive product architecture, design and marketing and sales model. A typical enterprise search deployment costs $500K to several millions after considering software licenses, services, training, and other related costs.
It would therefore be advantageous to transform how enterprise search technologies are bought and deployed with an improvement on cost and quality of search.