The present invention relates to database management, and more specifically, to a method and apparatus for indexing a document database.
With the emergence of Internet Web 2.0, the NoSQL non-relational database has become an extremely popular new field. When facing demand for highly concurrent read and write of a database, demand for highly efficient storage and access of mass data, and demand for high scalability and high availability of a database, a relational database has become powerless. Compared with a relational database, a NoSQL database has flexible scalability, and there are various types of NoSQL databases, however, a common feature thereof is that relational characteristics of the relational database have been removed. There is no relationship between data, so it is very easy to be extended, which brings scalability at the architectural level. A document database is a very important branch in non-relational databases, and it is mainly used for storing, indexing and managing document-oriented data or similar semi-structured data. As the name suggests, a critical core concept of the document database (document-oriented database) is document, which is the smallest unit in the database. MongoDB is currently the most popular NoSQL database, which is a set-oriented, model-independent document database, wherein data are grouped by “set”, each set has a unique name and may contain an unlimited number of documents. Here, the set is similar to a table in a relational database, the only difference is that it does not have any explicit schema.
Creating a database index is an important aspect in database management, the database index is a data structure for sorting one or more columns of values in a database table, the data structure refers to (points to) data in some way, to assist in rapid querying and updating of data in the database table. A relational database is usually stored in table structure, and establishment of an index may simply be only directed to some fixed fields. Whereas a document database usually does not define field structure, and during the process in which a document database is being used, there is constantly new document introducing new field structure, so pre-selecting some fixed fields cannot effectively deal with dynamic change in document fields of the document database. In addition, due to greater difficulty in data chunking, indexing a relational database is directed to all the data in a table. When data amount is huge, especially when all the data in a non-relational database document providing online services are being indexed, performance of accessing the database during indexing becomes very poor.
Accordingly, a method for effectively indexing a document database is needed.