The handling of “big data” is popular for affording parallelized large-scale data processing and is employed in a great variety of settings. The mining and processing of data in this setting have proven to be valuable to businesses and other entities, e.g., in determining or understanding a potential customer base. Data can originally derive from a great variety of sources, including social media, news and other online sources. The data may then be processed in large-scale distributed parallel computing systems, such as Hadoop® clusters. Hadoop® is an open source implementation of MapReduce by Google®, and is a registered trademark of the Apache Software Foundation.
The desirability of providing a search and browsing interface for data, such as in settings as just described, has long been noted. However, conventional efforts in that connection have generally fallen short in providing a significant degree of versatility and utility, inasmuch as individual assets (or specifically defined sets of data) tend only to be searchable independently.