This invention relates to automated techniques for performing business analysis, and more particularly, to computerized techniques for performing business predictions based on incomplete datasets and/or datasets derived from stage-based business operations.
Analysts commonly use a number of statistical techniques to provide accurate predictions regarding the likely course of manufacturing operations. The success of these techniques stems from a host of analytical-friendly factors associated with the manufacturing environment. For instance, manufacturing operations can generally be precisely described in mathematical terms. The economic aspects of the manufacturing environment are also generally well understood and can be precisely articulated. Further, a typical manufacturing environment provides a well-established technical infrastructure for recording salient parameters regarding the performance of the manufacturing operation. This infrastructure, coupled with the typically large amounts of data generated in a manufacturing operation, provides a rich historical database from which to derive accurate and robust statistical models for use in performing predictions.
Other fields are not so conducive to the development and application of accurate modeling techniques. For instance, analysts may have much greater difficulty developing and applying accurate analytical models in a “pure” business-related environment, such as a finance or service-related environment. This difficultly ensues from several factors. First, a business-related operation may be more difficult to precisely describe in mathematical terms compared to a manufacturing environment. This may be attributed to the fact that some of the metrics used in a business-related environment are inherently more “mushy” compared to parameters used in a manufacturing environment. This may also be due to difficulty in fitting mechanistic metaphors to a pure business operation, or due to difficulty in completely understanding (and thus modeling) complex relationships present in some business operations.
In addition, a business-related environment may not always maintain the kinds of data-rich archives found in manufacturing environments. This may be attributed in some cases to lack of suitable technical infrastructure for collecting operational data in business-related environments. In other cases, the failure to collect sufficient data may be attributed to the fact that the businesses have never collected certain kinds of information in the past, and thus the businesses may lack the kinds of cultures that encourage the regimented collection and archiving of such information. Deficiencies of this nature may result in one or more “holes” in the data that describes the past course of the business operation.
More significantly, a business may fail to collect enough data due to long cycle times found in many business environments (e.g., compared to manufacturing environments where an assembly line may quickly generate many products). The cycle time of a product refers the span of time required to completely process the product from a defined starting point to a defined termination point. For example, the cycle time of a loan approval process for a particular candidate may be defined by the span of time measured from an initial contact with the customer to a final approval and acceptance of a loan by the customer. These types of cycle times may span several days, several months, or even several years (e.g., for some complex commercial transactions). This may mean that a new business may operate for a lengthy period of time before it develops a sufficient amount of data to faithfully represent the full range of actions performed on an asset throughout its lifecycle. Incomplete datasets are referred to by various names in the art, such as “censored” datasets or “truncated” datasets. Censored data points are those whose measured properties are not known precisely, but are known to lie above or below some limiting sensitivity. Truncated data points are those which are missing from the sample altogether due to sensitivity limits.
The problem of incomplete datasets is particularly troublesome when developing and applying business models. This is because business models are typically developed to track the empirically-established history of a business operation. Accordingly, a model developed on the basis of an incomplete historical record may fail to properly characterize the business operation as a whole. For instance, a business operation may include plural stages that together span several months. If a business has only collected data for the initial stages of the operation, then a model developed from this data may not adequately describe the later stages of the operation.
In addition to the above-noted difficulties, the nature of the operations performed in a business-related environment may differ in significant ways from the operations performed in manufacturing environments. For instance, as noted above, some business-related operations are characterized by a series of discrete steps or stages performed in a predefined order. The above-described loan processing environment is illustrative of this kind of business operation. The loan approval process can be viewed as comprising a first stage of identifying a potential customer, a second stage of assessing the risk associated with providing a loan to the potential customer (as determined by an underwriter), a third stage of receiving feedback from the customer regarding the customer's acceptance or rejection of the offered loan terms and conditions, a fourth stage of issuing the loan to the customer, and so on. As appreciated by the inventors, the individual stages in a multi-stage process may differ in fundamental ways, yet have complex interrelationships that link these stages together. Thus, unlike more routine manufacturing environments, an analyst may have difficulty developing a single model that tracks and describes these divergent stages. Viewed in mathematical terms, an analyst may have difficultly finding a single equation that fits the “shape” of all of the stages in the business operation.
The negative consequences of the above-described difficulties can be significant. This is because predictions based on a faulty model will also be faulty. Reliance on faulty predictions can result in inappropriate decisions being made within the business, effectively steering the business in suboptimal directions. Needless to say, such faulty guidance can have a negative economic impact on the business.
Techniques have been developed to address the problem of incomplete (e.g., censored) datasets. While these techniques work well with relatively small amounts of missing data, they begin to break down when a dataset contains larger amounts of missing data. Some business environments present scenarios in which the quantity of missing data approaches or even exceeds 50 percent of the total population of data that should have been collected. Traditional techniques cannot successfully handle datasets with this extent of missing data. Also, traditional techniques typically perform poorly in handling the stage-based data typically collected from stage-based business operations.
For at least the above-identified reasons, there is an exemplary need in the art to develop and apply more robust models that can be used in a business-related environment. There is a more particular need to develop and apply more effective models that specifically provide accurate analysis when exposed to incomplete datasets and/or datasets predicated on stage-based business operations.