The present invention relates to online analytical processing (OLAP) systems, and in particular, to OLAP systems that efficiently secure and authorize multidimensional transactional data.
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
In authorization concepts for securing transactional data in multidimensional repositories, the transactional data is restricted from being accessed via multidimensional ordering criteria (master data). An example of this can be seen in U.S. Patent Application Pub. No. 2005/0257066. Access may be attempted by online analytical processor (OLAP) devices (e.g., implemented by application servers) when accessing data in databases (e.g., implemented by database servers). There are two general paradigms in the treatment of authorizations (or restrictions) of certain combinations of authorizations: a “filter” paradigm and a “barrier” paradigm.
In the filter paradigm, data is filtered before being processed for display. For example, the OLAP system sends a request for data to the database system, and the request includes information regarding the requesting user. The database system uses authorization data to execute a filter on the data before providing the filtered data to the OLAP system. If the user is authorized to access particular data by the authorization data, then the data provided to the OLAP system includes only that particular data; all other data is not provided to the OLAP system.
In the barrier paradigm, the selection set (piece of data) that will be loaded from the database is checked as a whole before it is actually loaded. The answer may be “yes” or “no” and depending on that, the data is processed or not. If not, a message is sent indicating “no authorizations”. For example, an authorization barrier system may be implemented between the OLAP system and the database system. Upon an access request by the OLAP system, the authorization barrier system checks whether the user is authorized to access all the data requested. If so, then all data requested is provided, but if not, no data is provided.
TABLE 1 provides and example that illustrates data and authorizations. TABLE 1 shows “2-dimensional” data available for “Material” and “Customer” and as key figure its “Revenue”. The term “transactional data” refers to the numbers (e.g. 100$ revenue). A single piece of transactional data may be referred to as an “element”. The term “master data” refers to the names of the customers and the type of the material.
TABLE 1CustomerMaterialRevenueAronAntibiotic100$BushAntibiotic50$BushBrush20$DeckerBrush200$MillerDesk400$NormanDesk500$NormanTable200$AronTable123$AronDesk456$
Note that all customers may have bought all materials, but in practice, of course many customers may have bought some materials but not all. So, even if there are 100 customers and 50 materials there may be less than 100×50=5000 combinations but the maximum may be 5000 data sets.
Assume that a user is reporting on this revenue data and sending a query (to get a report) to the multidimensional query system (e.g., an OLAP system). Further assume that the user is restricted to a couple of combinations (subsets). He is authorized to access Customers between Aron and Decker for Materials between Antibiotic and Brush, and is authorized to access Materials between Desk and Table for Customers Miller and Norman. So he may see the revenue data from Bush who bought a Brush but not the revenue which was created by Aron who bought a Desk.
Note that the transactional data (the “numbers” or key figures) is protected, and not the names for material or users or the combinations themselves. (Such may be referred to as “master data protection”, which is not the task of an OLAP engine). So the terminology “access to” refers to access to the key figures that are “labeled” or categorized by the combinations of master data values (“names” for user and material).
TABLE 2TABLE 2 describes the above authorizations as twointerval combinations:CustomerMaterialAuthorization1[Aron, Decker][Antibiotic, Brush]Authorization2[Miller, Norman][Desk, Table]
Given the data in TABLE 1 and the authorizations of TABLE 2, reports may be generated according to the two paradigms, as described below.
The first report is a drilldown of Customer and Material, with a sum for each Customer and a Total Sum. TABLE 3 shows the case where the user has full authorization and thus shows all the data:
TABLE 3CustomerMaterialRevenueAronAntibiotic100$Table123$Desk456$Sum679$BushAntibiotic50$Brush20$Sum70$DeckerBrush200$Sum200$MillerDesk400$Sum400$NormanDesk500$Table200$Sum700$Total2049$
Under the filter paradigm using the authorizations of TABLE 2, a report according to TABLE 4 may result:
TABLE 4CustomerMaterialRevenueAronAntibiotic100$Sum100$BushAntibiotic50$Brush20$Sum70$DeckerBrush200$Sum200$MillerDesk400$Sum400$NormanDesk500$Table200$Sum700$Total1470$
As discussed above, in the filter paradigm the system loads all available data and before displaying it to a restricted user, it filters out (quits) all data sets that are not authorized. The result is shown in TABLE 4. Note that only those data sets that are authorized are displayed. The missing data is (Aron, Table, 123$) and (Aron, Desk, 456$) as it was filtered out. Further note that the total differs (1470$ versus 2049$).
The filter paradigm is easy to define and the user sees always data, maybe less than expected, but never “no authorization”.
One disadvantage of the filter paradigm is with many data sets like millions or even hundreds of million of data sets this becomes very slow as all data has to be loaded (and touched) to be quit in the result set. This is usually done on the database before loading data into the system, but still all data has to be analyzed in the SQL (structured query language) execution.
Another disadvantage of the filter paradigm is that the data is user dependent. For example, if User 1 is authorized according to TABLE 2 and User 2 is fully authorized, User 1 sees 1470$ total revenue whereas User 2 sees 2049$. As a more extreme example, revenue for Europe may be 100 million for User 1, and 120 million for User 2, because data rows with collected 20 million revenue have been filtered out.
Under the barrier paradigm using the authorizations of TABLE 2, no report will result. Without any further restriction in the query definition, using the authorization restrictions of TABLE 2 results that the access is not granted, usually with a message like “no authorization”. The reason is that the report selects all Materials and all users without restriction which we have not taken into account. But not everything is authorized as can be seen in the authorizations (TABLE 2).
An alternative is to restrict the query (or the report) by an additional selection restriction, but it already becomes difficult to define the two 2-dimensional authorizations as restrictions (since they are not concave, i.e. no rectangle).
Thus, according to the barrier paradigm, a number or key figure is either visible or not, but it does not depend on the user. Returning to the extreme example above (Revenue for Europe), if the authorization for two users is such that they both receive data (i.e., the error message “no authorization” does not occur), the summary values displayed are the same for both users viewing the report, even if their specific authorizations differ.
One disadvantage of the barrier paradigm is that since the interval combination of the example (TABLE 2) does not authorize the combinations of Aron-Decker and Desk-Table, restricting the query (or the report) by an additional selection restriction is an extremely difficult task and usually leads always to an answer “no authorization”.
TABLE 5 shows the format (i.e., it includes no data) of a second type of example report. It shows a view with Customer in x-direction and Material in y-direction, giving a table with revenue numbers for the corresponding cells:
TABLE 5CustomerMaterialAronBushDeckerMillerNormanAntibioticBrushDeskTable
How is this handled according to the existing paradigms and how are the cells filled?
In the filter paradigm, the report depends upon the implementation; one example is TABLE 6 (using the data from TABLE 1 and the authorization from TABLE 2):
TABLE 6CustomerMaterialAronBushDeckerMillerNormanSUMAntibiotic100$50$150$Brush20$200$220$Desk400$500$900$Table200$200$SUM100$70$200$400$700$1470$ 
The cells without data may be empty (as shown in TABLE 6). Notice that the “emptiness” of the cells has different reasons for different cells. Brush/Aron for instance is empty as it is not booked. Table/Aron is not authorized (see TABLE 2) but may booked. This information is not shown to the user. Thus in the filter paradigm it is not possible at all to distinguish between these two reasons for an empty cell. So data may be missing (no data consistency) or not booked or not authorized. If there is another (invisible) dimension, say color, and User 1 has not authorized “blue”, but User 2 does have authorized “blue”, then the numbers may even differ, depending upon whether User 1 or User 2 looks at them.
In the barrier paradigm, the usual result is “no authorization” and no data is displayed at all. As discussed above regarding the first example report, it is difficult to craft additional selection restrictions such that data results. Most selections are not restricted to the authorized “area” and thus result in “no authorization”.
As another example, consider a third report that includes only Customer in Drilldown with an aggregation (sum) over Material. TABLE 7 gives an example of the third type of report using the data from TABLE 1 assuming all authorizations have been granted:
TABLE 7CustomerRevenueAron679$Bush70$Decker200$Miller400$Norman700$SUM2049$
In the filter paradigm, TABLE 8 shows the third report (e.g., format similar to TABLE 7) generated according to the data of TABLE 1 and the authorizations of TABLE 2:
TABLE 8CustomerRevenueAron100$Bush70$Decker200$Miller400$Norman700$SUM1470$
The revenue numbers are user dependent: they depend on what the user has authorized. As a specific example, some of the revenue for Customer Aron (123$ and 456$, see TABLE 1) is not displayed due to the authorizations (TABLE 2). The revenue is summed over only the authorized values, contradicting transparency or data consistency requirements. The revenue that Customer Aron caused thus depends on the user looking at it. It is not apparent from the report which partial revenues have contributed or even that the displayed revenue sum may not be complete or accurate.
In the barrier paradigm, as with the second report, the usual result is “no authorization” and no data is displayed at all.