1. Technical Field
The embodiments herein generally relate to a data integration process and particularly relate to a method of integrating a data from a source to a destination. The embodiments herein more particularly relate to a method and system for integrating a data from a source to a destination in real time.
2. Description of the Related Art
A data integration process involves combining a data residing in different sources and providing the users with a unified view of the data. In this process, a data is frequently fetched (polled) from a source system to achieve the changes in the destination. The data integration process becomes significant in a variety of situations such as commercial and scientific fields. In commercial field, the data integration plays very important role in merging the databases of two similar companies. In a similar way, there is a need to provide an integration among the data in scientific field while combining the research results from different repositories.
An important aspect of data integration is writing the polled data at the destination after processing the polled data. At present there are many systems and methods for processing a data. A polling of the data from a source system means reading the data from the source system. A processing of the data involves writing the authenticated data in the destination system after reading a data from the source system. A polling frequency indicates a frequency of polling and reading the data from the source. A difference in time between two polling processes is referred as a polling interval. A polling data can be a data of same entity or different entities or new entity.
An integration solution is always expected to process the data as soon as it is generated at the source. A polling process can be done frequently to bring the changes quickly. But this is not enough to bring the source and destination systems in a synchronized stated until the polled data is processed. After processing the data the systems are synchronized. Therefore a processing of the data need to be fast besides a frequent polling process. One way of achieving a quick processing of data is to perform a parallel processing operation. The processing of the data in parallel can work as long the data are independent in nature. But the parallel processing will bring either inconsistency or failure when the data are dependent on each other.
An integration solution provides an effective way to keep two systems in a synchronized state. So it is very important to transfer all the changes from the source to the destination in an integration process. For fetching all the changes, the integration solution has to look into a source after every preset time interval, to check for any updates of an entity. The standard integration solution is designed based on a current state of date. Consider a case in which an entity gets updated more than once between two successive polling intervals. Consider case in which a first polling is done at a time t1 and the next polling is done at a time t5. But before t5 and after t1, the entity E1 is updated twice at the times t2 and t4. c1 is a change done at the time t2 and c2 is the change done at the time t4. Now suppose that the change c2 is dependent on the change c1 so that the change c2 can only be made only after doing the change c1. During an integration process, the polling done at t5 will fetch the entity E1 updated at t4 so that only the latest state of E1 which is c2 is polled and synchronized to the destination. The change c1 is not synchronized to destination and c2 fails as c1 which is a prerequisite is not found in the destination. One way of solving the problem is to attach a trigger to the source. The trigger on each update in system publishes a change list to the integration system. The integration system can then work on it effectively. In the above case, a trigger is generated to invoke an integration application with the set of changes done in c1 as soon as a change c1 is done and it will be repeated for c2. This allows the integration system to track all the changes done in the system and the integration solution will be able to synchronize the changes immediately. But consider the case in which an integration system goes down because of power shutdown or system crash or some other run time failures. Whenever there is any change in system, a trigger invokes an integration system and ensures that no other component is activated. When the trigger does not initiate any action, then the integration system will miss the change. Even when the trigger initiates an exemption then also the integration system will miss the change. When the integration system is operated afterwards, it will be unaware of the changes made to the entity in the source system and will try to synchronize the next incoming change normally. Thus attaching the trigger also does not work at times. This solution is not versatile and does not provide a robust approach.
Every integration solution has two functions. One function is a polling process to fetch all the changes to an entity from the system and pass the same for further processing. The integration solution also functions as an adapter to accept the changes coming from other systems to write the same to the destination. During a polling process, the integration solution has to take care of various expected and unexpected failure cases besides a data fetching process to ensure that all the changes are polled and a given change is fetched only once. Similarly the adapter is operated not only for writing the changes but also for ensuring that no change is written twice or no change is overwritten in the destination. Even when the polling module ensures that no change is polled and sent twice, the adapter is also operated to prevent the handling of update request coming more than once. For any good integration solution, all parts are decoupled from each other and all the components are not aware of the existence of anything else than their function in an integration solution. The polling part is not aware of the working and functioning of the adapter and the same holds good for the adapter too. Sometimes the adapter has to handle the situation in which a same change is to be written twice in the destination. Consider a case in which the adapter does not keep track of the changes till the changes are written. The adapter fetches an event E1 and writes the same to destination with state S1 and system goes down afterwards. Now a user comes and updates the destination entity to a state S2. When the polling module sends the entity E1 again at this moment due to some reason, then the adapter will write E1 again rolling back the state in S2 to S1. To ensure that the adapter does not roll back any changes done by an external user, it is important to check whether the event has already been addressed or not. It can be solved in one way by comparing a current/latest state in the destination with the incoming new values and updating the new value in the destination when it is found that the latest state/value is not same as the new value. But this solution does not work when the destination is updated by some other user and the incoming new values may not be equal to the current state in the destination after this update. As a result, the adapter will overwrite the new changes done by user and will roll back to the old state thereby leading to the old problem. Thus the currently available solutions do not solve the failures in the integration process fully and reliably. Thus the sync is not recoverable with the existing solutions for any kind of failures.
The main job of an integration solution is to synchronize the data from a source to a destination and to keep all the updates done in the source in a synchronized condition with the destination. It is very critical for an integration solution to ensure that the source data is written at a right place in the destination. Any failure in a synchronization process results in an invalid or irrecoverable condition in the destination which may result in a loss to the company in terms of time or money as important data is no more valid. Hence there is a need to develop an integration process to ensure that the data is written at a right place in the destination.
Consider a case in which a user created an entity E1 in a source system and the integration solution fetched the entity and wrote the same into the destination as TE1. A synchronisation process is carried out once but there is a need to consider the further updates done on E1. The integration solution has to ensure that the further updates are written on TE1. When there is a failure in the integration process and TE2 is updated, then the first user will lose data in TE2 and TE1 will not be in a synchronized condition. The integration solution has to identify the right entity since the update has to be written only on the right entity at the destination. And also the integration has to confirm that E1 in the source is in the same state as the TE1 in destination. The currently available solutions for achieving this explained as follows. The name or title of the entity is checked first. Secondly a primary key is written in the custom field in a target. By sorting the entity by name, an entity E1 in the source created with name N1 is synchronized to the destination with the same name N1. When the entity E1 is updated next time, then the destination is searched to get the entity with name N1. When TE1 is obtained based on the name N1, then TE1 is updated correctly. But systems can allow different entities to have the same name. For example, when an entity E2 is also assigned with a name N1 in the source, then TE2 is also assigned with the same name N1 in the destination or target. In such a condition it will be difficult to select an entity (TE1 or TE2) from the destination based on name for updating when E1 or E2 is updated at the source, as both the TE1 and TE2 are assigned with the same name N1 in the destination. Thus a process of searching an entity by name to update the changes in the destination does not result in a proper updating process. Hence a global id is generated to solve this problem. With the existing global id generation method, it is very difficult to find the replica of the entity in all the synchronized systems. Further the existing solutions are also not extensible.
According to embodiment herein, the system further comprises using an event based trigger and a scheduler based trigger to poll and synchronize a data from a source to a destination.
When a poll is active at the time of a scheduler based trigger, then the scheduler based trigger is skipped. When an event based trigger is received and no poll is active, then the event based trigger kicks off the polling process. When the poll is active, the event based trigger sets a flag indicating a need to repoll at the end of a poll. At the end of a poll, the repoll flag is checked for. When the repoll flag is set, then another poll cycle is immediately kicked off.
Hence, there is a need for a method for integrating a data from a source to a destination in real time to replicate all the changes done to an entity in the source in the destination without missing any updates. There is also a need for a method to address the problems with incremental changes, bulk changes, changes from multiple locations of a source. Further there is a need for a solution to integrate a data based on a as of state condition of the entity. Yet there is a need to integrate a data using a multistep recovery process.
The abovementioned shortcomings, disadvantages and problems are addressed herein and which will be understood by reading and studying the following specification.