Data plays a major role affecting the whole information spectrum which is conventionally known as database management systems (DBMS) as discussed next.
Data:
DBMS were designed to satisfy the storage and retrieval needs of large static software system. DBMS have been extremely successful in sustaining the demands of large stable corporate entities that required little change over time. However, in a dynamic environment, like biotechnology, where the main activity is research for finding new relations among different organisms and compounds mandates to quickly capture new results that link existing with newly found or introduced attributes, in which case a DBMS quickly breaks down. Similarly, in the area of data mining where new relations are discovered about existing datasets, these relations need to be integrated with the datasets to effectively put to work the new data mined relations.
Three main obstacles that best describe such breakdown are as follows:                1. The fact that existing DBMS are static consumes time and expensive resources in the event a change is needed, due to the schema re-design process. A dynamic DBMS allows end-users to quickly reflect a change like create new attributes or capture new relations on-the-fly.        2. A shift in type of user directly accessing the database is needed. Database administrators (DBAs) guarded their systems very closely preventing any access. This is essential when DBMS systems are used, for example, in call centers with users' level of sophistication is minimal and the type of data at hand is important. In the example of the biotech research, the end-user is a scientist of an experienced caliber and the type of data is sometimes owned by that same scientist who guards it with a degree of vigilance maybe higher than a DBA.        3. Collaboration becomes essential in keeping a system up to date which requires a system to accept new input from several sources, in the form of new data categories and relationships and not only new data values. The traditional database life cycle of requirement gathering by a team, schema design, than passing the flow to a few DBAs breaks down because of limiting sophisticated end-users access to directly contribute to the system over time thus keeping the system up to date.        
The above three obstacles stem mainly from existing DBMS static approach to change. Examples of most widely used static models are relational and object oriented. These obstacles will be clearer as to why they exist after discussing the short comings of both models.
Conventional DBMS:
Conventional relational DBMS, the most widely used data model, is based on lambda calculus. It tightly fixes the relation between attributes, data elements signifying a group of values like phone-numbers, to their relations. For example, a relation called “Customer” is represented as a table with a group of attributes: name, address, and phone #. This is represented in the data dictionary, a master lookup table used by the relational model to parse and execute SQL queries. An entry in the data dictionary would look like “Customer: name, address→phone #” which means a phone # is dependant on name and address in relation/table “Customer” and that the attributes name, address, and phone # reside in relation Customer. In the data dictionary, each table has one entry with the table name as the key value for search together with all the functional dependencies within that table. A user cannot get a person's phone # if they only know their name, the name of the relation must be supplied as well. In the relational model, the interface language available to programmers, i.e. technical users, is SQL. A user executes an SQL query, like: “Select phone # from Customer where name=Matt” to get Matt's phone number out of the relational model. If the user does not know the table name “Customer” and that attributes name and phone # belong to the Customer table, the user will not be able to retrieve a person's phone number, which is known as schema in the relational model. The user is required to have a priori knowledge of the underlying schema in order to use the DBMS. The schema identifies the relations between attributes and tables and the dependencies among attributes. This is one reason that makes relational models rigid because attributes are fixed to a relationship once a schema is designed.
The above design limitation of fixing attributes to relations in a relational model explains its rigidity and therefore its inability to change relationships on-the-fly. Let us assume customer now has a cell phone number and a DBA adds it to table Customer as a quick work around to avoid schema redesign (see FIG. 12 fourth column in table customer). Let's say customer Matt moves (See FIG. 13). Due to the delete anomalies, the customer cell phone number data will be lost since the functional dependency of table Customer in the data dictionary looked like: “Customer: name, address→phone #, cell phone” which renders the cell phone dependent on the name and address. To avoid losing the cell phone number information, the user needs to split the Customer table into two tables, say Customer12 and Customer13 (See FIG. 14), which triggers a complete schema redesign cycle (see FIG. 17). Similarly, if phone is found to depend only on name and not address, a similar split will be needed (See FIG. 15) to preserve the functional dependencies as name→phone # and name→address. This ruggedness to change can have major business impact due to time delay and expenses incurred. For example, when the FCC issued that customers can keep their phone number irrespective of the phone company allowing users to take their phone numbers wherever they move implies that phone# depends on name and not address triggers a databases re-design.
Another aspect of relational database model rigidity is that the process of designing a schema is manual—carried out by experts (database administrators) utilizing a set of external business rules and data normalization rules. These rules when put together with the experience of experts define the data model that exists in the relational database. The data normalization rules enforce the integrity of the data in the model (i.e. business rules) under design.
Other database models, like object-oriented database have similar limitations. In FIG. 16, the object model encapsulates data elements by methods. The relationships and logic of the encapsulated data elements are locked in the encapsulating methods. A change mandates a change in the method and the messages related to accessing that method will need to change as well rendering the model also rigid to dynamic change requiring a system redesign. Object-oriented suffer from two other main obstacles: data encapsulation makes it harder for an end user to access the data without knowing the method to use. The user needs to know the object, the correct methods, as well as how to instantiate the messages for accessing the data as a priori knowledge of the system. Secondly, no standard query language exists in the object oriented paradigm which gained the relational model popularity in the market share due to the ease of use and standard operators available in the SQL language.
Type of User: Both models suffer from limiting their user community to highly technical users to interact with the DBMS such as DBAs or programs. Programmers embed statements to query or insert data in the function code they write allowing the function to interact with the underlying DBMS. Written functions shield end users from the details or the need of a priori knowledge of the underlying data model. However, this creates a layer banning access for an end user to interact with the actual data layer as stored in conventional DBMS.
Collaboration: From the previous point, type of users allowed to interact with DBMS is limited to only highly technical staff. This shuts access for end-user collaboration, mainly the business or non-technical user.
Information:
Beyond data, the two best known categories are functions and results.
Software Systems
A software system is best described as a: data layer where data is stored in a conventional database management system (DBMS); a functionality layer where a group of functions are stored in a file system or a functionality server to be instantiated by a user via the interface layer; and an interface, i.e. result or rendering layer, which manages the input and output of the system to the User. Each layer responds to requests interfacing with the underlying layer forming a software system.
Prior art software systems of the type described in previous paragraph have limitations that can be characterized:                1. Interactions are one-way: Users request or insert data according to a predefined slot or functionality. For example, a user in a car repair shop notices that the color of the car does relate to the repair intervals of certain parts. Let's say red car drivers can be more aggressive resulting in more frequent tire and brake repairs. Adding attribute “color” to relations in a conventional system by an end user is not possible. It requires technical expertise and might trigger a schema re-design. This results into an impact on the DBMS, on the functionality server to re-write the function to reflect the new change, and the interface to accommodate the users to enter new requirements. That is what is meant by “one-way”, i.e. using the system as designed inserting or requesting data values based on some operation out of the system, not possible to add or change new requirements into the system. In case a result creates a new attribute, the result can not be stored back into the system—i.e. on-way.        2. Closed function-base: Inserting new functionality requires technical expertise to add and link them to the underlying DBMS, i.e. create the relation between the function and what attributes/tables it needs to access. Even if the software system exposes the function base, i.e. available functions, to the user empowering them to choose the sequence of execution by building a workflow or a data-flow-path (DFP), the pool of available functions is still closed. The system suffers from its inability to allow end-user to add new functions to the pool because making the link to the underlying DMBS requires knowledge of the underlying design of attributes and relationships.        3. User interaction: end-user interaction is limited to the interface as requested by the business needs and developed by a technical team. The end-user doesn't have control to make a change and depends on technical resources. System updates translates into a painful process with time delays for updates plus the expense to carry out.The biggest obstacle in a software system becomes its static DBMS that captures the business requirements of the system regarding relationships among data items like attributes and tables are stored. Because of this static nature, software systems are built in a “silo” fashion. The database is independent of the functionality server and the functions have data retrieval statements that are not tracked, while the results that come out of the system are usually not related to the state of the software system. Hence, a change in the DBMS doesn't trigger which functions are affected—no relationship exists—the programmers who insert the data access statements in the function code assume static state of the system and any schema re-design includes re-visiting and manually examining the function-base as well—i.e. “system silo effect”. Similarly with results, an obtained result doesn't keep track of the DFP used or data sources used.        
In a software system, the interface layer is best known to be data-driven or static. For example, a web-based system using dynamic data driven web layout is driven by a function that is based on the underlying static data model. The dynamic aspect of the interface is based on the permutations and combinations of the fixed pool of attributes whose data values trigger conditions causing a change in the web-state. Changing the interface behavior needs an update in the driving function(s) and updates to the data-model statements requiring a priori knowledge expected by a technical user. That is, the static DBMS remains an obstacle, because change (to the system) based on new attributes is limited to inserting data values in a pre-defined slot in the designed database.
The system silo effect hinders an end-users' ability to add new features or change system behavior.
Function
Advancements in function-bases are limited, that is, where functions can be stored, selected accessed, arranged for execution in a DFP manner, enabled by end users to gain more independent control in achieving results tailored to their needs quickly. A key obstacle remaining is the inability of end-users to introduce new functionality. As discussed above in the software system limitations, adding new functionality or modifying existing functionality that requires or reflects a change in the underlying data model remains a stumbling block to end-users. It requires technical assistance from resources to intervene. At the same time, functions that have the data retrieval components, such as the SQL statement, embedded in the code make them tightly-coupled—inflexible to be used against any other data source, creating a another layer of integration complexity. Hence, existing function-bases are static.
This limitation translates to a number of areas:                1. Collaboration: end-users can't share functions which encapsulate experience or complex logic instrumental in obtaining certain results.        2. Data integration: A function operates on data sources and its logic results in linking those sources creating a relation among them. This is key in advancement of data integration is to enable researches and end-users to contribute to large data integration efforts.Result        
Results suffer from symptoms that are similar to those in the case of functions. Prior art in capturing how results are obtained by which set of functions and data sources used cannot be stored back into a database if these results introduce the need for new attributes or new relations for the data to be stored in. For example, consider a merchant identifying its customers via their phone numbers. One customer requests to add the cell phone number as an alternate number. If the database doesn't have an attribute or a slot of an alternate or cell phone number, the merchant is stuck with a rigid system that makes expansion difficult—a common experience. Hence results that do require a DBMS change are stuck. Another shortcoming in results is in collaboration and sharing of results. It is difficult for a user to find how new results where obtained, what DFP—sequence of functions used with what input parameters and which versions of functions used, and the datasets used in the DFP. A user has no access to such valuable information of how a result is obtained or of finding how new data sources or functions could be used in obtaining a result—doing what-if analysis.
Data Integration:
Due to the importance of this topic—data integration—unraveling the root symptom is important and relevant to this invention as the tool under discussion solves root cause. Data integration across multiple sources today is typically performed as a manual activity, namely capturing a relation between two or more items. Doing this still remains a challenge! The key bottleneck remains the static nature of the tools and the approach to the problem.
Various obstacles, in the prior art, hinder dynamic data integration. Consider the way in which people try to integrate static systems in a collaborative environment. Most integration efforts tackle the process by assigning a limited number of experts (i.e., the “Group”) to integrate a number of underlying data sources. The Group creates a parser-like engine or identifies relationships in a schema-design-like exercise. The outcome is a virtual integration layer providing a unified view for the end user. By the time the integration process is complete; two factors already limit the effectiveness of the solution developed by the Group. First, each underlying data source continues to change. Second, other experts outside the Group continue to find new integration relationships. Hence, the Group is forced to undertake another effort of data integration. This iterative approach to integration fails to take into account the on going changes that take place while the integration process is in progress. By the time the process is complete, it is already obsolete!! The time spent creating a virtual integrating layer becomes the bottleneck during which no external integration rules or changes in underlying data sources are considered. Everything is put on hold until the next integration cycle.
Three obstacles to dynamic data integration are as follows:                1. The Tools: Current data modeling technologies, which are the tools used to solve integration problems, are static. Adding a new relation triggers a schema re-design in the relational model and a similar effect takes place in the object oriented model. On top of that, a parser engine-like approach which acts as the virtual integration layer locks in the logic linking data items from underlying data sources rendering it static as well. This creates a compound stagnant effect, as it not only locks the logic in the parser; it also mandates that the underlying data sources must remain static as well. This multifaceted static nature of current integration approaches, i.e. a static virtual integration layer depending on static underlying data sources—renders a fragile and inflexible solution.        2. The Expertise: The process of schema redesign mandates a restricted number of participants. During an exercise to update a schema, expertise is restricted to only a few available participants who have expert knowledge of the field and the data modeling tools. Similarly, the process of creating a virtual integration layer is limited by a small number of participating experts to produce a single snapshot. This static design methodology only captures the experience of a few at a point in time.        3. The Logic: Integrating data items is based on logic to relate those items to one another. The relating logic varies depending on some similarity factor like complexity or dimensionality. In dimensionality, like data items are those that share in space, in time, or in an experiment, . . . etc. In complexity, like data items are those that share similarity resulting from a search, say a Blast search (a function that matches, identifying a relation, the input of a gene sub-string to which gene in the human map), or after running a data mining function which is the result of a more complex function. Such similarity can be based on one or a series of functions leading to the complex logic.        
Data integration (DI) is an example that highlights the severity of the problem at hand. The use of static tools across all information categories limits the ability to identify or tackle a practical and timely solution. A large number of disciplines that need to deal with legacy data face data integration and data access challenges, biotechnology is just one example.
Integration—Beyond Data:
Integration may alternatively be viewed as creating a model to represent a real life scenario as a mathematical/software modeling. Modeling involves capturing the relationships and interactions among different elements from real life into a model. Data modeling, like schema design, is a snapshot in time. The art of modeling is therefore the art of successfully identifying and selecting those elements that affect the environment under study, the relationships among those elements, and determining which elements are to remain constant in the environment. Modeling is about fixing parameters—which elements will be changing, possible changes for each element, and which elements will remain static. For example, in modeling traffic flow of cars in a city center, one may fix the car width, number of lanes per street depending on the total width of existing streets. On the other hand, the timing of green lights, and the flow rate of cars, may be kept variable to experiment with the model-optimizing parameters like green light duration. Hence changing a fixed element or introducing new variables in a model will result in a re-design. Models are created following the static approach as is in a data model.
Mathematical modeling usually focuses more closely on the relationships in a model and rules of interaction among elements in a model. These aspects may be captured in a function to mimic and govern the interaction. However, functions in a model still rely on data sources to simulate the targeted environment. Hence, the same inflexibility problems as discussed earlier in prior art make it difficult to deal with redesigns that require a change in the model.