Systems which include data processing usually require excessive planning and knowledge on available alternatives before a technology to apply for the data processing which provides for an efficient and economical solution can be chosen. It is however becoming harder and harder for application developers, enterprises, etc. to know which type of technology that best suits their needs, and it is especially hard to predict future needs, due e.g. to possible future changes in the user behaviour and thereby changed requirements from the used technology. Today these types of processes rely entirely on technical experts and their knowledge on which technology that is “best” suited at the moment and which technology is expected to be the “best” choice also for the future. Such a selection process is therefore to a large degree executed as a manual process which is almost inevitably both costly and time consuming.
It is also hard to determine when the current solution no longer works optimal, and thus when there would be more advantageous to use another, alternative technology. Underlying reasons for a change of database technology could e.g. be major changes in the applied data access patterns, changes in the amount of data stored/retrieved, new requirements from one or more applications using the data to be handled by the used database/s, new usage of an application or new usage of stored data generated by an application.
The amount of data generated each day is growing faster and faster. The number of different database solutions available for handling this data is also growing rapidly, and there are today a multitude of different database solutions to choose from, such as e.g.:                RDBMS:        Key-value stores        Graph database:        In-memory databases:        Document databases:        
Each of these database technologies (and in some cases specific data processing solutions) has certain characteristics, advantages and disadvantages and is therefore, on a case by case basis, more or less suitable for handling certain types of data at least partly depending on the data access patterns applied.
RDBMS (Relational Database Management System) is a database which stores data in form of related tables. This type of database is powerful because it requires few assumptions about how data is related or how it will be extracted from the database. As a result, the same database can be viewed in many different ways.
One drawback with RDBMS is that it is relatively slow when processing relational data. Another commonly known drawback with RDBMS is that it tend to suffer from scalability limitations.
However, RDBMS has been commonly used up until now within most industries and can be found in a wide range of products.
The NoSQL movement, not only SQL movement, addressed by e.g. Google, Facebook and Amazon to name a few, has targeted other challenges to handle data that is not suitable for RDBMS. Those challenges included handling of large volumes of data without using any specific schema. Due to the amounts of data it is normally distributed over several machines, making JOIN operations (commonly used in RDBMS) not sufficient. NoSQL are normally built on record storage (key-value stores). The NoSQL storage cannot handle relational models data structures, but are great for handling large scale data and real time applications and is used e.g. by Google, Facebook, and Twitter.
Document-oriented databases are one of the main categories of so-called NoSQL. The central concept of a document-oriented database is the notion of a Document.
Graph databases have nodes, edges and properties, where every node contains a pointer to the adjacent element why no indexes lookup is needed. Compared with relational databases, graph databases are often faster for associative, data sets, and map more directly to the structure of object-oriented applications and Graph databases scales better since they typically do not need join operations, Graph Databases are typically faster in calculating shortest paths between nodes.
In-memory databases primarily rely on a main memory of a computer main storage. Main memory databases are faster than disk-optimized databases since the internal optimization algorithms are simpler and execute fewer CPU instructions. They also reduce I/O when accessing data why the overall response time is improved further. This type of databases are, however, often expensive and can not store as much data as a distributed system can.
Document-oriented databases are one of the main categories of so-called NoSQL. The central concept of a document-oriented database is the notion of a Document.
When determining which technology to choose issues such as e.g. one or more of the following therefore may need to be considered to various extents.                Amount of data stored        Frequency of reads        Frequency of writes        Requirements on latency        Requirements on consistency        Requirements on availability        Retention period        Logical and physical structure of the data        Online transactions processing (OLTP) versus decision support        
Therefore, the more complex the process for selecting database technology becomes, the more evident it is that there is a desire for a more flexible and efficient selection process, not only, when initially choosing the most optimal database technology, but also after a database technology has been chosen and changed user behaviour and following change in the user pattern of the database technology can be identified
Obviously there is a problem of how to efficiently select a database technology that best fits present data access pattern requirements and needs from the applications which have access to the data, and also how to determine that the database technology should be changed from the one presently used to another one more suitable to a new scenario.
Today this type of demand e.g. for determining that an application should preferably shift from e.g. a relational data base technology to a key value store, require a great effort from database expertise. Some claim that the work to handle data access may sum up to close to 80% of the total application developer time consumed, and therefore methods for making this process more simplistic are needed.