Many different contexts exist where large datasets require storage and processing. Optimizing data storage may reduce the amount of storage required. Optimizing data storage may enable faster processing (i.e. use) of the data for example, to determine certain desired information from the data.
With an ever increasing amount of data on consumer shopping behaviour on e-commerce applications and websites, it is of great necessity to analyze this data to decipher consumer interest, shopping preferences, and their ideal product matches. A number of applications and algorithms have been proposed and applied to a sub-segment of the data collected for e-commerce applications, including collaborative filtering [1] and clustering of customers into specific segments for which unique products would be recommended [2]. The essential idea behind these methods is to use the data as a means of predicting the intent of the customer based on which a product is suggested. Success in the form of a good suggestion will result in a higher conversion rate, and ultimately, an increase in sales for the e-commerce app or site.
A common approach in analyzing e-commerce data has been the application of machine learning approaches such as deep learning [11] and support vector machines [4]. For example, [5] used multiple recurrent neural networks to categorize e-commerce items. These methods provide the means to analyze the data based on prior user sessions which can be readily obtained. Using such approaches to personalize recommendations can usually be effective, as shown in [6] and [7]. In [7], feature engineering was applied to e-commerce data from Alibaba's T-mall dataset for the binary classification of buyers into repeat buyers and non-repeat buyers.
An interesting approach was taken by [3], where a special data structure which was specifically designed for optimized searching and matching of data was used for analyzing e-commerce data. The idea here was that part of the computation would be performed prior to storage of the data, enabling more efficient real-time analysis during new user sessions. An approach similar along these lines was that of [8], which utilized pair-wise co-occurrences for retail data mining. Essentially, [8] used the joint distribution between pair-wise variables in order to understand retail behaviours.