1. Field of the Invention
The invention relates to a system of storage and methods for searching and retrieving information presentable as a plurality of information entities.
2. Related Art
For many years in computer technologies information storage systems also referred to as database management systems (DBMS) arouse great interest. There are two most wide-spread classes of storage systems that encompass the majority of currently existing DBMS.
The so-called relational DBMS constitute the first, vastest and most popular class of such systems. The vast number of inventions done by the present date is related in one or another way to relational databases.
By way of example, US patent 20030154197 titled “Flexible relational data storage method and apparatus” describes a method and system for creating a flexible database application allowing users to add, update and remove data columns and, optionally, the displayable data field attributes of such columns within a table of a relational database. A collection of data records is stored in four or more special tables as a set of data. Such table structure gives user computers greater flexibility of access to the DBMS server and provides for data control over a computer network.
Systems for storing data in form of structured XML documents having a wide distribution also rely on relational principles. For example, US patent 20060101320 titled “System and method for the storage, indexing and retrieval of XML documents using relational databases” discloses a method and system for storage, search and retrieval of XML documents within already existing relational databases. The essence of the method consists in transforming structured XML documents so that they become suitable for storing in conventional relational databases. During such transformation XML documents are “disassembled” and reduced to constituent elements and each such element (an XML document node) is assigned several metadata attributes describing the name of the element, the data it contains and the path leading from the XML-document root to this node. Every such element is then stored in one or several data columns of a relational database.
Retrieval of stored XML documents consists in transforming queries in Xpath and/or XQuery languages to an SQL queries to the relational database. Generated search results undergo the reverse process for restoring required XML documents from them.
It is pertinent to note that all methods and systems based on relational principles of data storage have common weaknesses. A practical implementation of the invention disclosed herein allows overcoming a number of serious limitations intrinsic to the relational approach:                The known technical solutions describe a single center of control over all processes pertaining to data storage.        
Decentralization is necessary that would allow distribution of the computing resources and capacity of computer systems required for data storage and processing among all nodes of the repository (and all computer platforms the repository is deployed on.) Such approach, firstly, allows to obtain a greater stability and fail-safety of the storage system owing to lack of the single node, the failure of which entails non-operability of the whole system (the so-called a single point of failure) and owing to wide possibilities for backing up data stored in a peer-to-peer network of storage devices, and, secondly, decentralization provides virtually unlimited capability of scaling the repository storage memory resources, the quantity of stored data objects, and a huge potential of scaling in terms of the amount of search queries that the system can handle in a unit time (that is in terms of querying rate.)                The relational approach to storage involves distribution of information objects data among a plurality of interrelated tables in accordance with a pre-designed relational structure, at the same time, during processing of retrieval queries it is often necessary to assemble the sought data object from a plurality of records in various tables, which in case of a vast and heavily ramified relational structure and great volume of stored data results in a significant drop in the rate of processing such retrieval queries.        
A different approach is needed in which all information about any object is concentrated at a single point, in a single data object, and a user's query returns the entire object at once. In this case the resource intensity of retrieval procedures may be significantly lowered (for example, it may become proportional to logarithm of the number of stored objects.)                In case of a relational database, hardware configuration of a storage server should be pre-determined at the stage of designing the database, and cannot be changed afterwards. If the database becomes oversized and querying rate grows beyond the computing power and channel capacity of a physical server, an extension of computing resources or addition of new servers to the system require complete shutdown of the database and probably a revision of its relational data model, optimization of distribution of the relational structure among physical storage devices and subsequent restart. It is obvious that all these procedures involve considerable additional expenses.        
There is necessity in a solution that could resolve the problem of a physical server overload by simple addition of one or several computers to a network thus enabling further database scaling. In other words, a system capable of automatic scaling for matching the augment of physical resources is necessary.
Distributed or peer-to-peer DBMS make another widespread class of data storage systems. Data in such systems are stored not in a centralized manner on separate specialized devices, i.e. data storage servers, but rather in a plurality of nodes of a peer-to-peer network containing storage devices provided with special software. Namely to this class of database systems the method for data storage disclosed in this invention pertains.
The majority of methods for data storage in decentralized DBMS use a hash value mechanism for retrieval of stored data. For example, in US patent 20060242155, titled “Systems and methods for providing distributed, decentralized data storage and retrieval”, a method and system for distributed and decentralized data storage and retrieval are disclosed. In the mentioned solution, data is represented as bit streams of multimedia information. Every bit stream is divided into individual fragments that are stored in the nodes of a peer-to-peer network of a decentralized storage system.
Every node maintains a local routing table with information about one or several neighbors. At least one of local routing tables comprises data about a hash basing on which a node address to which the bit stream for storage should be forwarded can be determined.
The use of hash values in storage and retrieval of data in distributed DBMS imposes some limitations on the flexibility of search querying in such databases. For example, only an exact match search is possible where the search query matches exactly the data attributes addressable by hash codes. If query data differs even in one bit from attributes of sought objects, such objects will not be found despite their relevance to the query.
A solution is needed that could obviate this shortcoming and make search more flexible.
Besides, the basis for methods of estimating response relevance to a query in modern search engines is coincidence of keywords of requested data with words of the queries. In this case, multiple possible representations of keywords determined by gender, case, number inflexions etc. should be taken into account and no account is taken of the degree of similarity between intrinsic structure of a query mask and sought information objects. Therefore, search methods allowing for ordered keywords in the query are essential.