The desire to store and analyze large amounts of data, once restricted to a few large corporations, has escalated and expanded. Much of this data is similar to the data that was traditionally managed by data warehouses, and as such, it could be reasonably stored and processed in a relational database management system (RDBMS). More and more often, however, data is not stored in an RDBMS. Rather, the data is stored in different systems including those that do not entail a predefined and ridged data model. One example is Hadoop in which data is stored in a distributed file system (a.k.a. HDFS) and is analyzed with components such as MapReduce, among others. Although not strictly accurate, data stored outside a RDBMS, such as in a file system like HDFS, is often termed unstructured while data inside an RDBMS is called structured.
While dealing with structured and unstructured data were separate endeavors for a long time, people are no longer satisfied with this situation. In particular, people analyzing structured data want to also analyze related unstructured data, and want to analyze combinations of both types of data. Similarly, people analyzing unstructured data want to combine it with related data stored in an RDBMS. Still further, even people analyzing data in an RDBMS may want to use tools like MapReduce for certain tasks. Keeping data in separate silos is no longer viable.
Various solutions have emerged that enable both structured and unstructured to be stored and analyzed efficiently and without barriers. One system that emerged is Polybase, which is a feature of a RDBMS parallel data warehouse that provides a single relational view with SQL (Structured Query Language) over both structured and unstructured data.