In its most general form, a database refers to a set of data elements and the data model by which these data elements are interrelated. In conventional databases, the meaning or value of each data element is determined by its position in the data model. A data element representing the given name of a person in the database gets its proper meaning because it is stored under or allocated to a label such as ‘Given_Name’ in a table called ‘People’, for example. It is for this reason that in conventional database systems the first step in the implementation and deployment of the database is to produce a conceptual data model that reflects the complete structure of the information to be held in the database.
In conventional databases, a data element cannot be stored if there is not provided a specific header or label to which the data element can be allocated. To state it differently, a data element can only be stored if a particular space to store the data element is addressed beforehand.
In a dynamic application environment, with large and larger data sets to be stored, creating such ‘complete’ data models is difficult, time consuming and requires frequent adaptations, as the number of items representing a new and not foreseen meaning of a data element readily expand over time. With the introduction of mobile telephony, for example, the need arose to expand the data model by the additional storage of a mobile telephone number next to a land line telephone number. Those skilled in the art will appreciate the difficulties that one encounters with the expansion of the items to be stored in a conventional data model, such as but not limited to data inconsistencies, ambivalent models, data multiplication, and so on, all potential harmful points of failure.
Having produced a conceptual data model, the next step is to translate this model into a form that actually implements the relevant items in the database. This process is often called the logical database design, and the output is a logical data model expressed in the form of a schema. Whereas the conceptual data model is (in theory at least) independent of the choice of database technology, the logical data model will be expressed in terms of a particular database technology.
At present, the most popular database model for general-purpose databases is the relational model, using a table-based format. The process of creating a logical database design using this model involves a methodical approach known as normalization. The goal of normalization is to ensure that each elementary ‘Item’ is only recorded in one place, so that insertions, updates, and deletions automatically maintain consistency.
Besides the relational model, and without aiming to be complete, other known types of database models are designated a hierarchical database model, a network model, an object model, a document model, an array model, and a semantic model, for example.
Data that resides in a fixed field within a record or file is also called structured data. That is, data contained in relational databases and spread sheets, for example. Information that cannot be readily classified and does not fit into a particular box or a traditional row-column database, is called unstructured data. Examples of unstructured data are photos, graphic images, presentations, emails, and word processing documents, for example. Unstructured data files often include text and multimedia content. Note that while these types of files may have an internal structure, they are still considered ‘unstructured’ because the data they contain does not fit neatly in a rigid data model structure.
Semi-structured data is a mix of structured and unstructured data. It is a type of structured data, but lacks the strict data model structure. With semi-structured data, tags or other types of markers are used to identify certain elements within the data, but the data does not follow a rigid structure. For example, word processing software now can include metadata showing the author's name and the date created, while the bulk of the document just being unstructured text.
While a particular database model may be optimal for storing one or another type of data, in practice, the known database modelling techniques all suffer to a greater or lesser extent the problems involved with the expansion in the amount and new types of data that is created over time in a dynamically evolving organization.
When querying a relational database, for example, that is when making a request to retrieve information stored in the database, no relations between data elements can be revealed other than defined by the data model structure. Further, many database systems require to make requests for information in the form of a stylized query that must be written in a special query language. This is the most complex method because it forces users to learn a specialized language.