A database management system is a means for manipulating and managing databases, and is used for creating, using, and maintaining databases. It manages and controls databases in a unified manner so as to ensure the security and integrity of the databases.
With the arising of the big data era, transactions and interactive data are also increasing rapidly. Terabyte (TB)-level data processing has become a basic configuration. Data types also transition from a single type to diversified types, such as structured data, unstructured data, and semi-structured data. The structured data generally refers to data information such as Enterprise Resource Planning (ERP) data and financial system data of enterprises. The unstructured data refers to data such as audios, pictures, and videos. The semi-structured data refers to self-describing data that has an implicit but not rigorous structure, such as e-mails, Hypertext Markup Language (HTML), reports, and repositories.
Conventional relational database management systems have some limitations when used for processing the aforementioned large-scale and diversified data, and are especially incompetent when used for processing unstructured data and semi-structured data. Therefore, the concept of NoSQL arises.
A NoSQL refers to a non-relational database or a database for storage of unstructured data. Column-type storage (such as Hbase and OTS) in the NoSQL storage is a column-based database to facilitate reading and writing big data content. A NoSQL storage model may be expressed in a form of a table. Each table includes multiple rows and each row is divided into multiple columns. When the table is created, a primary key column needs to be specified for the rows. The primary key column is usually used to group data, and rows of adjacent primary keys are usually organized together. A manner of querying a NoSQL database is to create an index for the NoSQL database.
An index is a structure that sorts values of one or more columns in a database table, and by using the index, a user can quickly access specific information in the database table. Therefore, required information may be found by creating an index for a database.
An Inverted index is one of the most commonly used data structures in NoSQL databases. The inverted index allows a user to search for a record according to an attribute value, and each item in an inverted index table includes an attribute value and locations of the records having this attribute value. As the records do not determine the attribute value, the attribute value is used to determine the locations of the records. Thus, the data structure is referred to as the inverted index. Using a common NoSQL database type, namely a document retrieval system as an example, if an inverted index needs to be created for document files stored on a hard disk, the attribute values are keywords in the documents. Each corresponding keyword record in the inverted index table includes a hard disk storage location where a document file that includes the keyword locates, and the hard disk storage location is called the index value.
Using the aforementioned document retrieval system as an example, when a database index needs to be created for a large quantity of documents, the documents and index may be stored in a data table of a NoSQL database and an inverted index table. A primary key of the data table is a document ID, and a value of the data table is a document content. A primary key of the inverted index table is a keyword, and a value of the inverted index table is a list of documents that include the keyword. By using the keyword primary key of the inverted index table, a user may search for document IDs of all documents that include the keyword, and then by searching the data table according to these document IDs, the user may find corresponding document contents. The aforementioned method achieves quick retrieval of information needed by a user from massive amounts of data, and achieves the purpose of creating an index for a NoSQL database.
However, the existing inverted index created based on NoSQL for data has low query efficiency because, when a user performs a query according to a keyword, the entire inverted index table needs to be searched to find the corresponding keyword, and time spent on the search increases exponentially with the amount of data, which leads to a low query efficiency.
In addition, an updating efficiency of the existing NoSQL database index is excessively low. Using the aforementioned document retrieval system as an example, in an existing NoSQL system, when data of a new document is added, first, the original inverted index table needs to be searched to find locations of keywords of the new document in the inverted index table, and then a document ID of the new document corresponding to these keywords is written into the inverted index table. As content of the inverted index table needs to be read first, which significantly lowers an updating speed of the database index, when the database is very large, the updating speed will be unacceptable.
As the manner of creating an index based on NoSQL storage data in the existing techniques has the problems of low query efficiency and low update efficiency, the system has low throughput, and is incapable of processing writing and query for TB-sized documents.
Therefore, to create a NoSQL database index having a higher throughput capability has become an urgent technical problem to be solved.