The present invention relates to a system and method for structuring digitally stored information and a method for searching this information. A computer program product and uses of the system and methods are also disclosed, Especially, the invention addresses the problem of searching large information spaces/databases, like e.g. a national telephone directory, large file systems or the Internet.
The existing and increasing amount of information available today in an electronic form, places heavy demands on computer hardware as regards to e.g. memory capacity and processor speed when searching in this information. Information in electronic form is e.g., network versions of telephone directories, files stored on computer hard disks or on network servers (e.g. LAN, WAN), www pages. The digital information may be organized and stored in large databases, and retrieving information from these structures requires complex search routines, powerful processors and storage capacity. However, it may still be a time consuming and tedious process to retrieve information as desired from these databases.
For Internet, which is a very large information space, various search engines and searchable directories (like Yahoo) have been developed for searching and retrieving the existing information. The information is then indexed and arranged in searchable format, e.g., databases, and stored on servers. A problem with such prior art search engines and searchable directories are the need of large physical storage capacity. All the indexed and/or processed information is physically stored, and searching all this information often arranged in one huge database, is not always very efficient.
A binary search, which is often utilized when searching database structures, is a search algorithm that repeatedly divides an ordered search space in half according to how the required value compares with the middle element. When searching large databases, this becomes a time consuming process, as the entire database must be searched at least once. Usually, searches are performed in selected columns in the database only. If it is necessary to combine information arranged in different columns in the database for achieving a usable and meaningful search result, and the database is large, the search procedure may take a very long time, and is sometimes not performable due to the huge number of possible combinations of the information arranged in the cells in the different columns. How this information should be presented for the user in a usable and meaningful manner is also a problem if a search request, in e.g. a database or any another information space, results in large amounts of hits. Searching is often elaborate and contrasts with users always demanding information presented in an instant.
The present invention has been conceived to solve, or at least alleviate, the problems of the prior art as described above. Thus, in accordance with a first aspect of the present invention there is provided a system for structuring digitally stored information, the system being included in a data processing system. The structuring system comprises: a database comprising a number of cells arranged in rows and columns for holding pieces of data representing the information, wherein only one piece of data may be stored in each cell, the pieces of data contained in cells in one row of the database constituting a certain information; an index for each unique piece of data occurring in the database, each index table providing information concerning all locations of that unique piece of data in the database; and a main index listing once all the unique pieces of data in the database together with a corresponding index table identifier, the index table identifier providing a link to the corresponding index table for that particular piece of data.
The locations may be expressed by coordinates, the coordinates defining the cells in the database. Preferably the coordinates are (x, y) coordinate pairs respectively representing the columns and rows in the database. The coordinate pairs in the index tables are then first sorted according to the absolute value of the x coordinates and second according to the value of the y coordinate. The pieces of data with the highest absolute values of x have by definition a higher relevance than pieces of data with lower absolute values of x. By relevance means that the pieces of data found to best describe the represented information are assigned a high x value. In one embodiment the database may also comprise an address or link to the digitally stored information represented by the pieces of data contained in each row in the database.
The pieces of data contained in cells in one row in the database may represent a search string associated with an URL address or a file path. The digitally stored information may represent a telephone directory.
In a second aspect, the invention provides a method in a data processing system for structuring digitally stored information. The method comprises: processing the information in a data processing, unit and storing pieces of data representative of the information in cells in a database, the cells in the database being arranged in rows and columns, all the cells in the same row in the database constituting a certain information; creating at least one index for each unique piece of data occurring in the database comprising information of all locations of the unique piece of data in the database; and creating a main index listing once all the unique pieces of data occurring in the database together with a corresponding index identifier providing a link to the corresponding index for a unique piece of data.
In a first embodiment of the invented method, an indexing agent is used for indexing the stored information. This indexing agent may be a spider, web crawler or any other suitable agent. Preferably, the information is processed and the pieces of data representative of the information arranged in the database in such a way that pieces of data assigned high absolute values of the x coordinates are more descriptive for the represented information than pieces of data assigned lower absolute values of the x coordinates. The pieces of data may be keywords describing the digitally stored information. When the keywords contained in cells in each row in the database constitute a search string, the method further comprises creating an index for each position a unique keyword occurs in the search strings, and creating a corresponding index identifier associated with the keyword in the main index. A resource locator for the processed information may be included in each row in the database, providing a link to the digitally stored information.
In a third aspect, the invention provides a method in a data processing system for searching digitally stored information, wherein the information is structured in a database/index system as defined above. The method comprises: inputting a desired information through an interface; searching the main index table selecting pieces of data corresponding to the desired information and thereby selecting index tables; searching the selected index tables selecting at least one location of a cell in the database containing the desired information; and selecting the row in the database in which the cell is located and retrieving the desired information.
In one embodiment, when the desired information is expressed in the form of a sequenced keyword search string, the method further comprises determining the order of the keyword in the sequence of keywords, and selecting index tables corresponding both to the desired information and the order of the keyword in the input search string.
The desired information may be input through a search engine interface, and the retrieved information displayed in a display device. A typical display device is may be a computer screen, but also the display on a mobile phone or WAP.
In a forth aspect the invention provides a computer program product for a data processing system, comprising a computer readable medium, having thereon computer readable program means, which when loaded into an internal memory of a data processing system, makes the data processing system perform the structuring method as defined above.
In a fifth aspect the invention also provides a computer program product for a data processing system, comprising computer readable code means which, when loaded into an internal memory of a data processing system, makes the data processing system perform the search method as defined above.
The invented system and methods may be used in a search engine for searching the Internet, in a handheld electronic device comprising a processor and memory (e.g. a mobile phone, a WAP phone or a portable computer) or in a computer for retrieving files in a data storage device. The invention provides a solution for organizing and searching information in an efficient manner, and presenting the information in an instantly usable way. The invented solution provides faster processing by minimising the search itself, and also results in reduced costs for running and upgrading the search system. Information can easily be added and deleted, and provided independent of the search language used. The invention is defined in the appended claims.