1. Field of the Invention
The present invention relates in general to the field of processing information and more specifically to knowledge representation for changing entities.
2. Description of the Related Art
According to Plutarch in his Life of Theseus, the mythical founder-king Theseus of Athens returned from Crete on ship that had each plank replaced during the voyage. Plutarch wondered if the ship that set sail from Crete was the same ship that arrived at Athens. In a similar vein, Plato in Cratylus quotes Heraclitus as observing that one cannot step in the same river twice because waters continue to flowing in. Socrates himself has related concerns with characterizing or naming things that change. “I think we should abandon names of this kind: our greatest probability of finding correctly assigned names in connection with those things that have a permanent being and nature.” However, some things that do not have a “permanent being and nature” often still require names and other properties.
Though a formidable philosophical literature on these topics exists, the issues are not only academic. For example, what happened to Dade County, Florida? It was there on Nov. 12, 1997, with FIPS code 25. On Nov. 13, 1997, “Dade County” changed its name to “Miami-Dade County”, and the new name resulted in a FIPS code change to 86.
Unfortunately systems that do not track identity through time will report erroneous data in these cases. For example, SAS software's Problem Note 31231 reports, “The SAS/GRAPH map data sets MAPS.COUNTY, MAPS.COUNTIES, and MAPS.USCOUNTY incorrectly use the county FIPS code value of 25 rather than the county FIPS code of 86 for Miami-Dade county, Florida.” Furthermore, “the SAS/GRAPH map data set MAPS.CNTYNAME incorrectly uses the name ‘Dade’ and the FIPS value of 25 rather than the correct name ‘Miami-Dade’ and FIPS value of 86 for Miami-Dade county.” The problem reports concludes that “this [error] can cause problems when trying to map response data that contains the correct FIPS code.” However, the problem report understates the nature of the problem. If one makes the changes suggested in the problem report and if one subsequently runs some analysis on data older than Nov. 13, 1997, the analysis will be incorrect. In fact, there is no way for this system, as currently designed, to give correct results for data that spans Nov. 12, 1997. What at first might appear to be a mere annoyance can result in serious errors that can be difficult to detect despite their substantial impact in subsequent analysis.
Many other examples exist. North American Industry Classification System (NAICS) industry code 1211 (“Bituminous coal and lignite”) was split into codes 1221 (“Bituminous coal and lignite—surface”) and 1222 (“Bituminous coal—underground”) in the 1987 NAICS industry code update. A time series that spans that transition and references those codes could be difficult to process correctly. As another simple example, the company formed as “AOL Time Warner” has at various stages acquired and spun off “AOL”, “Time Warner”, “Time Warner Cable”, “AOL” (different), while also experiencing at least one renaming (to “Time Warner” from “AOL Time Warner”). Germany has had eight different currencies since 1873. The Kingdom of Montenegro became Montenegro via Yugoslavia. Apple, Inc. has sold a succession of different iPhones. Such examples exist in many domains.
Some systems provide what is essentially a synonym service using, for example, the Resource Description Framework (“RDF”) property owl:sameAs. However, this property has no notion of time or context. “AOL Time Warner” should not be owl:sameAs “Time Warner” at any time. The Freebase system, which is a sophisticated system in many respects, does not even attempt to model the persistence of identity through time:                Identity over time is a difficult subject to model on Freebase. If [a] building has over time been used as church and a dance club, Freebase will type it as a church and as a dance club. This gives rise to coherency issues. Right now Freebase doesn't have a (sic) infrastructure to easily model change. [http://wiki.freebase.com/wiki/Identity_over_time]        
Part of the challenge is that modeling identity over time is sufficiently challenging on its own that an application not dedicated to that enterprise simply cannot afford an excursion into that territory.