For centuries, scientists have relied on the scientific method to prove or disprove hypotheses about causality in our world. In order to show that an action (A) will cause an outcome (B), scientists typically execute a controlled experiment where the action (A) is performed on one population (a test group), and compare the outcome to a similar population where the action was not performed (a control group). Any consistent and repeatable difference in outcome (B) is then considered to be caused by the action (A).
In recent years, this approach has been adopted by retailers, banks, manufacturers, restaurants, hotels, schools, and insurers, among others, to assess the incremental impact of any business initiative on customer behavior and, hence, key performance metrics (such as customer count, revenue, and profit). These companies apply the scientific method by executing a designed test of the initiative in a subset of their locations and comparing their performance to locations that did not receive the initiative. Because the initiative was actually tried in a subset of stores, and their performance was compared to stores that did not receive the initiative, the business can have a confident perspective on “what will happen” if they execute the initiative going forward.
Examples of these business initiatives include: changing prices, moving products to different areas of the store, introducing new products, adding or reducing sales staff, remodeling older stores, and running television ads. Knowing how these initiatives will contribute to the business's profitability before committing to a major network-wide investment allows them to more accurately predict which ideas will work and which will waste valuable capital.
However, actively designing and executing a test can be a costly and time-intensive exercise. At the same time, for initiatives such as price changes, advertising, and changes in sales staff (as well as uncontrollable events such as weather and economic conditions), store-by-store variations already occur on a daily basis. For example, a district manager may decide to reduce price on a key product in his 20 stores, or a store manager may decide to take on an additional employee to help out during the lunchtime rush. Regardless of the intention, these small variations can be thought of miniature tests, or “natural experiments,” and can provide much the same insight as a designed test.
Automatically detecting such experiments can be hugely valuable to a business. First, they can be used as a substitute for actively designed tests. Executing a designed test takes significant resources between to determine the action to take, address operational complexities, coordinate its implementation, wait for it to be in market long enough, and measure the action for its effectiveness. This feedback delay lengthens the time before any resultant profit-generating action can be taken, reducing its return on investment and constraining the number of tests that are feasible to run. Second, these natural experiments can provide insights on-demand that lead directly to massively profitable changed decisions. For example, figuring out whether a bottle of soda should be priced at $0.89, $0.99, or $1.09, given that it costs the retailer $0.50, can swing profits by as much as 50%, depending on consumer response to each price point.
Because these variations occur as a natural course of business and are not centrally coordinated or explicitly tracked (in contrast to a central list of stores to be remodeled), it can be very challenging to discover these examples in order to use them to assess the impact of the changes. However, because these companies are already tracking metrics such as labor hours, price charged, and advertising levels by store and by day, there exists an opportunity to automatically mine these datasets to detect instances where part of the network changed and part of the network did not, which may be considered “natural experiments.”
There are basically two existing alternatives to assessing the incremental impact of historic changes: (1) econometric modeling/time series regression, and (2) manually finding natural experiments. Econometric modeling uses each individual store/day combination as one observation of both the “independent variable” (e.g., price) and the “dependent variable” (e.g., units sold), and performs a basic statistical regression between the two. Econometric modeling uses a standard statistical technique, tests whether a significant relationship exists, and if so, quantifies the size of change in units that is commonly associated with each $1 change in price. This conventional approach is commonly used, particularly in the advertising industry, to assess the historic incremental impact of different actions. Almost all major advertisers employ media mix modeling using consultancies that apply multivariate regression modeling techniques to (1) establish the portion of historic sales that were due to advertising, and (2) forecast what incremental sales will be generated by future advertising plans. For example, a store may desire to predict how a new promotion will impact their sales. So they would build a regression model to predict sales within a time period. The model would consider all factors as variables (e.g., how much inventory was available, the use of media advertisements, the weather) and the coefficient for each variable would correlate its impact.
Econometric modeling has some shortcomings. Fundamentally, this type of modeling tests for correlation, not causation. As an example, imagine the business question being whether an increase in inventory (units on hand to be sold) of a particular product will drive an increase in sales. At the same time, assume that stores with strong sales of that product have already adjusted their inventory upward to accommodate demand. Regression analysis will identify that inventory is highly correlated with sales and an econometric approach may therefore assume that future increases in inventory will cause more sales. But in this case, it is more likely that it is the higher expectation for sales that drives the higher inventory and not the higher inventory is driving the higher sales. So drawing conclusions from correlational studies can be inaccurate, and a more rigorous test versus control approach to high stakes business decisions is needed. Finding test versus control opportunities to establish causation between a business's specific actions and resultant changes in profitability is generally recognized as a superior approach. This, for example, is why the Food and Drug Administration asks for test versus control (also known as placebo) tests to validate drug effectiveness and safety instead of relying on cross-sectional studies and regression modeling. However, it may be challenging to find or create such examples in a business environment.
A user can manually attempt to find natural experiments in the same datasets, but the user must look through each week of data for a group of stores that changed and a group of stores that did not. This method has some shortcomings, as it is highly time-intensive, complex, and error-prone. As a result, it can be very challenging to find the best subset of “test” stores that balances consistency and length of the experiment with similarity to the “control” group.
What is desired is a system to automatically scan historic datasets (e.g., price by store over time) to: (a) detect the dates on which pockets of stores change in a consistent manner while other stores do not; (b) determine the best subset of test and control stores to be used, balancing store count (in both test and control), internal consistency of the change (in test), similarity (between test and control), and length of the change; (c) compile a list of all the experiments found, and present a rank ordered list with the “best” natural experiments at the top; and (d) allow the user to select which experiments to be analyzed (measurement of the change's impact on key performance metrics).