It is often necessary to answer specific analytic questions that involve a particular range (or ranges) of values of some entity in the data. This capability is often required to meet one or more of the following requirements:                self-service (no need for administrators or the IT department to be involved)        non-technical (no special skills or knowledge required)        easy-to-use        data source agnostic (no specific set up requirements of source data)        interface agnostic (any number of ways of accessing these analytic capabilities)        reproducible (once set up these can be re-run with new data and re-used by others)        fast        
It is difficult to identify segments of a population meeting certain criteria and then use these segments for analysis for further questions and analysis. The following are some examples of typical business questions that require answers:                HealthCare: What is the total Paid Amount that is generated by High Cost Claimants (where High Cost Claimant is a Patient with over $30,000 in total claims).        HealthCare: How much is the total Paid Amount and how it compares between Claimants that had different number of Inpatient Admits (1 vs. 2 vs. 3 vs. more than 3).        HealthCare: How does the number of Drug Prescriptions compare between Claimants that have different number of Inpatient Admits (1 vs. 2 vs 3 vs. more than 3)        Sales: What are the total number of Products and their total Cost sold by various tier Sales People (where the tiers are “Sales People with Less than $10,000 in total sales per person”, “Sales People with $10,000-$50,000 in total sales per person”, and “Sales People with Over $50,000 in total sales per person”).        Retail: What are the most common products purchased by people who spend more than $500 per visit to the supermarket, and what are the most common products this year versus last year.        
The mechanism to do this type of analysis needs to be able to work for data of any nature:                any industry        any division within a company        any volume of data        on any entity dimension even if it has very high cardinality (e.g. Customer ID, Individual ID, or even Transaction ID)        
Self-service: It is one thing for programmers and IT staff to be able to create these types of segmentation rules in advance based on defined user requirements, but ideally End Users should be able to create these segmentation rules themselves with no help or resources from any programmer or IT staff member. Also it is important for them to be able to create these for immediate use for “on-the-fly” analysis so that the moment they define what they are interested in, they can use it.
Easy-to-use: This type of analysis should be possible without any specialist technical or programming knowledge. i.e. an End User should be able to answer all these types of questions without needing to understand where or how the data is stored or structured. They should not need to know how to program in any query language (such as SQL or MDX).
The user should be able to answer these business/analytical questions via a simple point-and-click interface and they need not be concerned with the complexities of what processing is required to answer their question.
Data Source Agnostic: These analytical methodologies should not depend on what query engine is used, or how the data is stored or structured.
It also shouldn't matter if End Users set up their metadata for these types of analysis and over time (in the future), the data source changes to a different structure or database. If the underlying data source changes then whatever analyses that the End Users have created in the past should still work, and seamlessly switch over to use these new data sources.
“Creation Data Source” and “Application Data Source” Independence: The data source that the segmentation rules are created with may not be the data source that it is desired to use the derived relationship with. For example:                HealthCare: To analyze drug usage of the patients with various frequency of ER visits, it is necessary to define the bands of population using Outpatient Claims data (that contains ER visits) to define patients and their corresponding number of ER visits, grouping them into bands (e.g. 1-2 ER visits, 3-5 ER visits, more than 5 ER visits). And then to analyze drug usage by these patients, it is necessary to apply the bands (Mapping Relationship between the Patient IDs and the Bands) to the Pharmacy claims data, which contains the Patient ID, to see how many drug prescriptions per patient were issues in different bands.        HealthCare: To analyze Cost per Member based on the Size of the Family, it is necessary to define the Bands based on the Enrollment data, which contains all members of the plan, including their family members covered by the plan. It is then possible to define the bands to be various family sizes (e.g. Single, Plus One, Family of 3-4, Family with more than 4 members), and then apply these bands to any claims data (which contains Member ID) to analyze cost per member or other claim relevant information based on the Family size of the plan subscriber.        
Even though the “complete analysis” may come from two independent data sources, it should not be a requirement that these two data sources are merged at any point or at any aggregation level. These data sources could be massive and therefore any performance or storage requirements for the merge of them should be avoided.
Interface Agnostic: The use of Aggregate Banding should not be tied to any specific interface. For example it should be possible to provide Aggregate Banding via one or more of the following interfaces/environments:                Desktop (thick client)        Web Interface        Flash Application        Mobile device (e.g. iPhone and iPad app, or Android app)        Batch processing (scripted approach)        Installed on own servers        Available via SaaS from a multi-tenancy environment        
Reproducibility: Any analysis that is created, or segmentation rules that are created should be able to be shared between End Users, and a single definition should be able to be used by more than one report/analysis. Also it should be able to be defined once and re-used again in the future (once the underlying data is updated), published for others to make use of (but not necessarily be able to see or change the segmentation rules).
Within a multi-tenancy SaaS environment it should be possible to define a single set of segmentation rules and to be able to use this same definition across any number of tenants of the service utilizing their own data.
It is an object of preferred embodiments of the present invention to address some of the aforementioned disadvantages. An additional or alternative object is to at least provide the public with a useful choice.