The desire to store and analyze large amounts of data, once restricted to a few large corporations, has escalated and expanded. Much of this data is similar to the data that was traditionally managed by data warehouses, and as such, it could be reasonably stored and processed in a relational database management system (RDBMS). More and more often, however, data is not stored in an RDBMS. Rather, the data is stored in different systems including those that do not entail a predefined and ridged data model. For example, data may be stored and managed in a non-relational format, such as utilizing distributed file system (such as HDFS used in Hadoop framework) and is analyzed with components such MapReduce, among others. However, relational data stored by an RDBMS is formatted as relational data while data stored by Hadoop is non-relational data.
While dealing with relational and non-relational data were separate endeavors for a long time, people are no longer satisfied with this situation. In particular, people analyzing relational data also want to analyze non-relational data, and they want to analyze combinations of both types of data. Similarly, people analyzing non-relational data want to combine it with relational data stored in an RDBMS. Still further, even people analyzing data in an RDBMS may want to use tools like MapReduce, which is typically associated with processing non-relational data, for certain tasks. Keeping data in separate silos is no longer viable.
Various solutions have emerged that enable both relational and non-relational data to be stored and analyzed efficiently and without barriers. One system that emerged is Polybase, which is a feature of a RDBMS parallel data warehouse that provides a single relational view with SQL (Structured Query Language) over both relational and non-relational data.