The advent of the digital age, and the advances in information and communications technologies have made it exponentially cheaper and faster to collect, create, share, process and store information.
This, in turn, has resulted in a data explosion which is now commonly referred to as “big data” and which means in practice that many public and private structures now host very large databases of digital information.
These very large databases pose several challenges when one tries to analyze them, understand them and extract useful features from them. A first challenge is the computational cost of processing such large amounts of data, and it is now estimated that the energy consumed by data centers hosting and processing such databases has become a non-negligible fraction of the total energy consumption in western countries. Another challenge is the scaling issue of the algorithms that process these data which has led to a revival of the machine learning subfield of computer science, being now extremely active.
In recent years, a promising way to reduce the computational cost and scale of these tasks has been identified which consists in applying randomization algorithms to the initial data as part of a data pre-processing, thereby obtaining a smaller set of data on which less costly processing algorithms can implemented. A randomization algorithm typically computes a number of random projections of the initial data, also called “random features”, by randomly mixing the initial data together, therefore keeping, in a smaller dataset, a large amount of useful information contained in the initial data.
However, the computing of random projections usually involves the determination of very large random matrices and the computation of matrix-products between these random matrices and the input data. These steps still require a large amount of processing power and memory and are thus bottlenecks for the use of randomization algorithms in data pre-processing.
Moreover, as an alternative to random projections, random embeddings, mapping the data to a much larger dimensional space, have been shown to allow an improvement in classifying complex data. Examples include Extreme Learning Machines (ELM) which are considered a competitive alternative to convolutional neural networks. Here again, the computing of such a mapping to a larger dimensional space requires a large amount of processing power and memory and is a bottleneck for the use of such data pre-processing for improved classification.
There is thus a need for an apparatus and a method that would provide a mixing of digital data without the aforementioned drawbacks.