Databases are widely used for data storage in many application domains. Data may be accessed efficiently from databases by sending queries to the database using a suitable query language. The query results are then used by downstream processes for the particular task involved. However, typically the downstream processes expect query results from the database in a particular form. This means that intermediate, or linking software typically needs to be written to export data from existing databases and manipulate that data before input to the downstream process. As a result data access from the database by the downstream process may be inefficient. Also, the intermediate or linking software typically needs to be specially written for the particular downstream process and this is time consuming and error prone. These problems are particularly acute in application domains in which large amounts of data are used and where missing values, outliers, erroneous values and other problems with the data occur.
These types of application domain include machine learning applications in which it is often beneficial to use huge amounts of data as this enables better learning outcomes to be achieved. However, any inefficiencies in database access by intermediate or linking software are exacerbated where huge amounts of data are to be accessed.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known database access systems.