Widespread use and acceptance of computer systems has resulted in a proliferation of data. A wide variety of enterprises and individuals generate and store large amounts of data. Data mining and analysis has therefore become increasingly important as people and enterprises try to glean intelligence from their data.
However, data mining and analysis efforts have been hampered by the large and disparate formats in which data may be stored and manipulated. Custom software is typically required to build system-wide solutions that combine data in a particular way, or implement a specific analysis technique.
For example, a retail store seeking to store, mine and analyze their data in a meaningful way faces a variety of challenges. The store may want to collect data from a variety of locations. One source of data is the store itself—inventory on hand, and daily sales, for example. Another source of data are the store's suppliers. The suppliers all may have their own data about inventory availability, price, and selection. The store's customers may also have their own data about their purchase history at that and other stores and demographic data, for example.
Store personnel generally are required to make a variety of decisions—such as which items to order, and how much of each item to order and at what time. Knowledge and the ability to analyze data from all of the above-mentioned sources may be able to aid in this decision, but only if the data is readily available and can be analyzed in accordance with criteria that will be useful to the store personnel.
Creating a software solution that is able to aid the store personnel in a meaningful way may require the ability to interface with the disparate systems that hold the store data, supplier data or data of other external entities, and customer data. Moreover, the software solution must be able to manipulate this information in accordance with the analytical criteria the store personnel would like evaluated.
Of course, many existing software systems aimed to mitigate communication between specific disparate data types or conduct a specific type of analysis on a given set of data. However, software developers routinely focus only on the problem at hand and build a specific software solution to connect the store's database with the database of the supplier, for example. The software itself is likely to depend on the particular data attributes used by the store and the supplier. Similarly, a software developer is likely to build an analytics solution that specifically answers the question posed by the store personnel. The analytics solution may be hardcoded to process the specific attribute types contained in the supplier and store databases, and compute one or a series of specific calculations desired by store personnel.
Accordingly, when the store or the supplier or a customer changes or adds a database type, a data attribute, or when store personnel would like data analyzed in a new manner, the software developer is required to intervene and change the software code used to analyze the data. This process may be cumbersome and it may limit the store's ability to have rich interactions with its data and change the analytical techniques used or the type of data analyzed.
Furthermore, due to the specific nature of the software involved, the store's solution will be unsuitable for another entity trying to perform a completely different data analysis task. For example, the store's inventory software solution would be completely unsuitable for analyzing traffic patterns on a nearby interstate. To handle traffic pattern analysis, the software developer would need to develop a new software solution from the ground up.
A brief summary of challenges in three areas are described below: 1) storing and sharing data amongst different database types; 2) communicating between different and unknown system types; and 3) acting on stored information.
Storing and Sharing Data
A brief overview of database storage and interoperability challenges will be described with reference to FIG. 1. Electronic databases, such as database 10 in FIG. 1, store data and are configured to access and manipulate the stored data according to various objects specified by the database, such as ObjectA in FIG. 1. An object is generally a collection of data and methods for manipulating the data. The objects may specify a variety of methods for manipulating the data, including methods for creating, reading, updating or deleting data entries in the database. MethodA and MethodB are shown in FIG. 1 associated with ObjectA.
An application programming interface (API) 15 provides access to the methods in the database 10. The API 15, for example, supports MethodA and MethodB used in the database 10. Clients access the database 10 by communicating a message 20 to the API 15. The message includes one of the functions supported by the API. So, for example, in FIG. 1, Client 25 passes a message 20 that includes a request to execute ‘MethodA’ and parameters for the execution of MethodA. Recall that MethodA is a specific method supported by the database 10.
Accordingly, the API 15 may achieve relatively high performance, because little if any manipulation must be performed on the received message 20 before passing a function call 30 to the database 10 to perform MethodA with the enclosed parameters. However, the API 15 is not very flexible.
For example, if database 10 stores customer information, ObjectA may refer to a customer, and MethodA may include a method entitled ‘GetCustomer’ which operates to retrieve a customer number based on an order number and a purchase date. Code used to implement the API 15 may then include the statement:
int GetCustomer (int OrderNumber, Date Time, PurchaseDate);
‘int’ indicates the API is expecting an integer (the customer number) returned from this call. The API also expects an integer order number and a date/time formatted purchase date to be transmitted along with the request. Message 20 must then include “GetCustomer (OrderNumber, PurchaseDate)”.
The API then passes the message 20 as a function call 30 to the database 10. Databases have various internal mechanisms for communication and formatting for function calls and parameters including any of a variety of querying languages specific to the database type such as SQL, a relational database query language or the like. The API 15 is specific to the database 10 and formats the message 20 into a proper function call 30 for the database 10.
There are several drawbacks to this approach for database communication that become increasingly apparent as databases become increasingly complex and numerous.
A first drawback is that the API 15 supports only the specific methods supported by the database 10. Once the source code for the API 15 is compiled, it becomes a static entity and cannot support any further methods. If the database 10 is changed to a different database that supports, say, MethodC instead of MethodA, the API 15 will not be able to generate a function call for MethodC. The API 15 will need to be rewritten and recompiled to support the new method. This requires that the API 15 be taken off-line (stopped or taken out of service) while a new API is written, compiled, and put back into service.
Another drawback is that the API 15 supports only the specific communication language used by the database 10. The API 15 only formats messages for a particular database language. Again, if the database 10 is changed to a different database utilizing some other internal language, the API 15 must be rewritten and recompiled, requiring time and a suitably skilled operator to perform the update.
The above drawbacks to database communication apply generally to all available database types including relational databases and object-oriented databases. Older database models, such as hierarchical and network models, also have the described drawbacks.
Communicating Between Different or Unknown System Types
As data collection and mining become increasingly important and sophisticated, communicating among complex systems for storing and manipulating data has become difficult. Separate applications or systems have their own internal methods of communicating and syntax and vocabulary for storing and manipulating data. A same or similar procedure may have different names in different databases or systems, for example. One application may want to call that procedure in a variety of different databases.
Present systems handle this communication difficulty by calling the specific procedure name for each system of interest. For example, a code fragment from a first system may call an IOrderFulfillment interface as part of the implementation of a call implementing an interface for accepting orders. An interface is generally a portion of code that receives calls from clients and mediates access to an underlying procedure or data store. The Accept Order class may be coded as follows:
public void AcceptOrder (string Order){    IOrderFulfillment iof = new OrderFulfillmnet( );    iof.FulfillOrder(Order);  }
This class needs to communicate with the IOrderfulfillment interface, implemented by a second system. The Fulfillment interface may itself need to communicate with a further interface, such as a shipment interface. The second system may encode an order fulfillment interface as follows:
public interface IOrderFulfillment{  void FulfillOrder (string Order);}public class OrderFulfillment : IOrderFulfillment{  public void FulfillOrder(string Order)  {  IOrderShipment iof = new OrderShipment( );  iof.ShipOrder(Order);  iof = new OrderShipmentEx( );  iof.ShipOrder(Order);  }}
The above examples contains two separate calls to an order shipment interface—one to OrderShipment( ) and one to OrderShipmentEx( ). These two consecutive calls are necessary to communicate with a third and fourth system, each of which implement the order shipment interface. One implementation, the third system, codes the order shipment interface as:                public class OrderShipment:IOrderShipment        
Another implementation, the fourth system, implements the order shipment interface as:                public class OrderShipmentEx:IorderShipment        
The second system above was accordingly required to make two consecutive calls, one to the order shipment interface as specifically implemented by the third system and the other to the order shipment interface as specifically implemented by the fourth system. A problem may arise if the third or fourth system, or both, are changed to different or new systems that no longer support the particular format of the calls made by the second system. If the implementation of the shipment interface changes, the implementation of the fulfillment interface of system two must be correspondingly changed and recompiled, which may be undesirable.
Acting on Stored Information
A variety of actions may be desirable using collected, stored data. In particular, an organization or entity may desire to evaluate a variety of conditions on stored data. However, the conditions will vary according to the type of data and the purposes of the requestor. For example, a thermostat application may desire to set a temperature to a warm setting if stored data indicates it is cold outside and it is the morning. This is a simple example of a type of rule that an application may desire to evaluate. The permutations of rules may vary widely according to the end use of an application, and the types of data available.
The common approach to the execution of rules would be to hard code the rule into the software system. So for example, a thermostat application may contain lines of code that are specific to the condition and action described above, that is, to temperature and temperature settings. The code would explicitly recite parameters such as ‘temperature’ and ‘set’. Should the users of a temperature application want to implement a different rule—for example, if an occupant is detected in a room, set the temperature to a certain level, or if it is night then set temperature to cold—a new code would be required to implement and evaluate the rule. This may require taking the system offline, preparing code, recompiling the code, and restarting the system.
The above description identifies some of the challenges in supplying a robust, scalable, rule-based expert system. While software may be prepared to interface specific disparate data systems and evaluate rules for a particular given end use, the resultant software application may be no more flexible than before. That is, the software is still confined to the particular use for which it was designed, and the specific systems for which it was designed to integrate. The addition or subtraction of systems, or the revision, addition, or subtraction of rules to be evaluated may require stopping the system, commissioning the preparation of specific new software code, re-compiling the software, and redeploying the software solution. This procedure may be cumbersome and limit the usefulness of any one particular software solution. Ultimately the time involved in a system shutdown, code drafting, re-compiling, and re-deployment, along with additional time for testing, may be prohibitive in designing or deploying a system change.