Computing systems that host online services, such as web servers, create and modify data structures for logging user visits and other interactions with online services. These interactions can be logged for various reasons, including data security and content customization. To do so, web servers store log files that include the events or transactions performed on a website. A log file typically includes information such as a visitor identification number and the time the visitor navigated to a particular site. Such log files are typically organized by date of transaction and split into multiple files when the files inevitably grow too large.
Data queries for interaction data objects are used to identify and analyze sources of electronic interactions with websites and other online services. For example, a website operator may query a server to search for a set of transactions associated with one website user. But traditional logging methods make satisfying queries computationally expensive and data-intensive.
For instance, log files typically consist of a time-indexed list of interaction data objects from a set of user devices, where the objects identifying user device interactions are organized according to a data and time of the interaction. This sequential nature of the logging, coupled with the fact that a user's visits to the website are likely spaced by hours or even days or weeks, greatly reduces the likelihood that two data objects describing two user interactions from the same user are in the same file. Furthermore, the size of such log files, which could include every historical transaction with a website or other online service, can be enormous, often petabytes of data. Additionally, log files are inevitably split at arbitrary points, into multiple files, requiring additional computing resources. Because millions of users can visit a website in one day, the complete data set, which is not organized by user, is spread across potentially hundreds or thousands of files.
These deficiencies result in slower search times when servicing queries about particular users. For example, when searching for interaction data about one user, a full scan of millions of records in many files could be required, since a low probability exists that a given user's interactions are stored sequentially in a file that stores interaction data for millions of users sequentially. These scattered files can have sizes in the order of terabytes (1012 bytes) or petabytes (1015 bytes). Consequently, a search for a particular user's data requires devoting processing resources to searching these large files across many different storage nodes.
Accordingly, solutions are needed to more efficiently store and access user interaction data objects.