1. Technical Field
The embodiments herein generally relate to a process of data integration and particularly relate to a method of polling and processing data in a data integration process. The embodiments herein more particularly relate to a method and system for polling and processing data in real time.
2. Description of the Related Art
A data integration process involves combining the data residing in different sources and providing the users with a unified view of the data. In this process, the data is frequently fetched (polled) from a source system to achieve the changes in a destination system. The data integration process becomes significant in a variety of situations such as commercial and scientific fields. In commercial field, the data integration plays very important role in merging the databases of two similar companies. In a similar way, there is a need to provide an integration among the data in scientific field while combining the research results from different repositories.
In a data integration process, a polling of the data from source system involves reading the data from source system and writing the data in a destination system after reading. The writing of the data to the destination system is called a processing of the data. A Polling frequency means how frequently the data is read from the source. A time difference between the two successive polling operations is referred as a polling interval. The polling of data can be a data of same entity or different entities or new entity.
One of the issues in polling and processing the data is the possibility of missing some changes while reading the change logs concurrently with users making the changes actively and hence writing the logs. For example consider a case in which a user U1 makes the changes at time T1 and user u2 makes the changes at time T2 (T2>T1). Due to various reasons such as process scheduling, database operations etc., it can happen that the changes done by the user U1 are actually committed after the changes done by the user U2. When the changes are committed by the user U2 at a time CA and the changes are committed by the user U1 at a time CB, a polling of the changes results in reading the changes committed at a time interval between CA and CB and as a result, only the changes done by the user U2 are fetched and not the changes done by the user U1 in that polling process. At the same time it will search for the time stamp of T2 in the history. In the next poll cycle only the changes made after T2 is fetched thereby missing the changes made by the user U1. This type of problem is called a passing in night problem.
A standard polling solution does not deal with a passing in night problem.
Hence, there is a need for a method for polling and processing a data in real time. There is also a need for a method to address a passing in night problem in data integration.
The abovementioned shortcomings, disadvantages and problems are addressed herein and which will be understood by reading and studying the following specification.