The present invention relates in general to Internet and intranet web page service providers and, in particular, to systems and methods for creating search word queries for use with online search engines and searchable content rich databases.
1. Technical Field
The invention is a method and system for creating improved search queries using pre-arranged controlled vocabularies, carefully selected topics, carefully selected word groups, and carefully selected word types. The invention is called a search builder. The search builder is a server-based program, which houses numerous individual topic oriented search builder modules. Each module is focused on a special topic of interest. Each module helps people select the exact terms to be used in a search query. The search query is then transmitted to a search engine or searchable database.
The field of search engines is fairly well known. Common search engines include those developed by Google, Verity, Inc., Alta Vista, Fast, Inc., and Lycos. By using a search engine, a user can retrieve needed information on a focused area of interest. The search engine typically retrieves documents satisfying the specified terms in a search query. A browser program is typically used to access the Internet and the myriad of web sites and search engines that are commonly available. Web browsers are also commonly used to access corporate, government or private intranets. The typical web browser includes provisions for navigating a web site through a graphical user interface used for both transmitting and receiving search queries, and presenting search query results. Web browsers can be found in a variety of commercial formats (Internet Explorer, Netscape, Mozilla, etc).
A typical search query input by a user is processed by an online search engine, which then access an indexed database of web pages which are sent back to the user in the form of a list of ranked web pages that respond to the users query words, based on some algorithm used by the search engine to rank and order results. The quality of the search results is dependent upon the words that are entered into the search engine.
Most search engines do not provide help or guidance in selecting the specific words to be used in the selection of the words used in a query. They typically present a graphical interface advanced search form with empty text boxes and written guidance in text on using the advanced search options, which describes the use of Boolean logic and technical syntax.
Most search engine users tend to use very few words in their search query. Most search word queries submitted by users of search engines contain only one or two words. This produces excessive results with large numbers of web sites that contain irrelevant information. It is difficult for a user to formulate a specific query capable of producing relevant results without the user having a more detailed knowledge of a given search topic or subject area. The difficulty is even more acute when a person of lay knowledge searches in a subject area containing technical terminology, knowledge, data, acronyms, or jargon. They simply do not know the language of the field to search effectively and efficiently. Even with expert and experienced knowledgeable users, they may know the field, but they may not appreciate or understand the differences between search engines, the nuances of advanced search that exists between search engines, that certain search engines and databases are better than others, or that getting better results requires use of specific syntax.
2. Description of the Related Art
There is little related prior art that specifically focuses on improving search query word selection.
Within the realm and spectrum of existing search engines, there are generally two types of search query options: simple search and advanced search. With a simple search, the user is presented a single search box consisting of a data entry form known as a text box in which one or more words may be entered.
With advanced search, the user is presented with one or more text boxes, and is given instructions on what will happen if the user enters a search word. With some advanced search engines options, the user is given a drop down menu that instructs the search engine to use certain Boolean operatives on whatever words are entered in the text box. Thus at Google.com, and most every popular search engine on the Internet, the general search option is simply a blank text box. The advanced search options allow a user to enter words of choice and the search will be conducted on xe2x80x9call the wordsxe2x80x9d, xe2x80x9cwith any of the wordsxe2x80x9d, as an xe2x80x9cexact phrasexe2x80x9d or with xe2x80x9cnone of the wordsxe2x80x9d. The search may also be conducted in any language or in a specified language, of in any file format, or on a specific file format, or within some specified time frame. The advanced search options at most of the search engines all focus on what is done with the words that are entered, rather than on what words are selected in the first place.
One new and recent innovation is clustering which assists users who enter search queries by surveying the indexed listing of web site results and summarizing the topics that the results cover, suggesting related terms and new directions for a follow on search, which can then be clicked on to get more results. The Alta Vista Prisma, and Vivisimo are examples of search engines and search tools that use this type of technology. These programs analyze and operate on the results of the web search, rather than on the query words themselves. Some programs search through the results from a search and create a summary listing of the metadata terms found in the search. They bring this back to the user to help them reiterate a better search.
A slightly different prior art approach focuses on analyzing the content of web pages and results that result from a search query from multiple search engines. Available search tools still do not help users select the words to use in queries but rather take whatever words are used and use metasearch tools to organize and cluster results from one or more search engines or searchable databases. (examples, Vivisimo, Copenic, Bullseye by Intelliseek).
Search engine expert Avi Rappaport has conducted extensive research on search tools and addressed various aspects of the field of queries. The most relevant developments are in the field known as faceted metadata search. In a recent paper she wrote:
Metadata is information about information: more precisely, it""s structured information about resources. This can be a single set of hierarchical subject labels, such as a Yahoo or Open Directory Project category. More often, the metadata has several facets: attributes in various orthogonal sets of categories. This is often stored in database record fields and tables, especially for product catalogs. The current spectrum of web sites that utilize faceted metadata include:
Music stores: songs have attributes such as artist, title, length, genre, date . . .
Recipes: cuisine, main ingredients, cooking style, holiday . . .
Travel site: articles have authors, dates, places, prices . . .
Regulatory documents: product and part codes, machine types, expiration dates . . .
Image collection: artist, date, style, type of image, major colors, theme . . . . In each of these cases however, there is no single way to provide navigation for everyone: users have such disparate needs. One person might want to look through all the U2 albums, while another is looking for classical guitar or 1940s jazz releases.
Other approaches to structured data access methods include Parametric Search Traditional field-based or parametric search engines for structured data which have used a command line or provided a form to fill out, and Advanced Search. These require a lot of knowledge on the searcher""s side; the searcher must know the values or choose from a popup menu. If they include too many parameters, they will probably not find any records that match their requirementsxe2x80x94a dead end. The possible values are hidden from the searcher, so all the work the editorial staff has done in defining and assigning attributes is lost.
Full text search engines are another approach Full text search engines can index all HTML metadata or gather data from multiple database fields or tables. Full text search wipes out the value of the metadata: a number 3 is just a number, not a size, price, product ID or other meaningful number, as it is in context of the tagged page or database record. Similarly, it""s hard to know whether a recipe, for example, has chili pepper as a significant ingredient or minor flavoring. While many searches are just fine without that information, there are other cases where providing that context would be extremely helpful. Ms Rappaport has also reviewed the present status of Faceted Metadata Search Resources and identifies work in progress by various organizations.
UC Berkeley professor Marti Hearst is investigating how faceted metadata can provide a dynamic information-architecture context for browsing and searching on web sites. Ms. Rappaport and her colleagues have surveyed and discussed the development of search tools but none have identified or developed a search tool like the present invention. The closest working models identified to date are for product databases and not for search engines or searchable databases.
Mr. Lou Rosenfeld, has also surveyed search engine tools and technology, recently observed that integration of algorithms to search to summarize and organize retrieved results, with a manual approach to query building is the future. But he observes that the problems in designing controlled vocabularies to meet users needs and satisfy user expectations is a huge issue because of the diverse needs of the users of the Internet. Rosenfeld has observed that data is factual in nature while web contact is language. Unlike data in product databases, web content is textual, and the language of web sites is ambiguous. He has also observed that there are too many individual topics out there, and that it is exceedingly difficult to create controlled vocabularies and useful thesauri to cover all users"" needs. He has surveyed the field and concludes that the chances of finding a silver bullet solution are slim. The prior art does not include any web sites using a search builder method of pre-arranged controlled vocabularies at all.
The conclusion is that there is no prior art that has refined and developed a query builder using pre-arranged controlled vocabularies and an advanced search interface to search engines and searchable databases for web results. Therefore it would be useful to provide an approach to improving word selection and the creation of more precise, detailed and on point search queries, and a system that can be used to quickly create, refine, and modify search queries for submittal to search engines and searchable databases, in an interactive online search.
The system generally operates in a distributed computing environment comprising individual computer systems interconnected over a network such as the internet, although the system could function equally well on a stand alone computer system.
In a preferred embodiment of the present invention, one or more servers are interconnected with a plurality of clients over an internetwork, and with a plurality of personal computers, over an intranetwork. The server systems include a memory (not shown), which is loaded into a server suite. The server suite provides the controls and functionality for an Internet service provider. For example, the server suite publishes web pages, thereby making each web page available to clients and PC""s over the internetwork and intranetwork, respectively. In accordance with the present invention, the server suite further comprises a search builder program, web page, and user interface as further described hereinbelow.
The search builder program is coupled to a custom administrative program and database into which is compiled the information needed to operate the program. The form of the data structures used in these lists are further described hereinbelow. The search builder topics, word groups, word types, and search query word lists are entered individually as ordered lists.
The server is also interconnected to secondary storage which can comprise any form of conventional random or non-random access storage device, such as a hard rive, CD ROM or tape system with fixed or removable media, as is known in the art.
Each web page is accessed by end users via web browsers operating on clients"" personal computers over the internetwork or on personal computers on an intranetwork. Each client and PC includes user interface devices, such as keyboards and monitors (not shown) as is known in the art, by which mouse clicks, types text and commands, search queries and other communications are input and search page results are output.
An exemplary example of a server system suitable for use with the present invention is an Intel Pentium based computer system having the following characteristics: 64 MB RAM, 1.0 GB hard drive, and network server connectivity. In the present invention, the server system is a proprietary server system suite written for and used exclusively by One World Telecommunications, Kennewick, Wash., which provides similar functionality to the Microsoft Windows NT Server Suite. The proprietary server system suite supports a simple page creation programming language that requires no knowledge of HTML programming or FTP uploads.
The foregoing aspects and many of the advantages of this invention will become more readily appreciated by reference to the following detailed description in conjunction with the accompanying drawings.