This invention relates to methods and devices for analyzing geological data or other information related to underground formations or zones that may contain oil or other natural resources. More specifically, the invention provides an improved method for correlating formation data stored in several databases.
Many sources of location-dependent data and other information related to an underground formation may exist in more than one database, e.g., seismic survey data, satellite or aircraft survey data, water or brine chemistry analysis, temperature survey data, geological analysis of surface outcroppings, core sample analysis, published well data, vendor, contractor, or third party data, and various proprietary well logs. In addition, summaries, abstracts, interpretations, and other full or partial duplications of some information may exist in still other databases. Improved correlation of the information from these distributed and sometimes disparate information sources, including the elimination of duplications, has been a long-term goal of underground formation analysts. Improved correlation should increase the understanding of a formation""s properties and help to determine the commercial potential of recoverable natural resources that may exist in the formation. It should also allow less costly drilling and/or other methods for recovering the natural resources in the formation.
However, elimination of duplication and other improvements to correlating location-dependent data can require significant time and money. Eliminating duplication can be difficult when different information identifiers are used for the same data in several databases, e.g., a summary or abstract in database #2 of Wellname no. 1 data in database #1 may be called Fieldname no. 1 data or Wellname-1 data in database #2. In another application, imprecise location information in database #1 may not allow direct correlation with another data set in database #2 that may be more precisely located or whose location information has been altered, e.g., truncated or rounded. In still other applications, the information may be stored in different databases having transposition errors, stored using location information having a different reference point, derived from sources having different levels of precision, stored using different naming standards, or have other differences that make identification of duplication and correlation of the remaining information difficult.
An algorithm is used in an inventive computer-based method to quickly identify duplicative location-dependent information or other attributes using textual identifiers, location boundaries and/or tolerances. The method allows correlations to be accomplished for location-dependent data and other information with or without precise location information, with or without consistent identifiers, and with or without abstracting or other errors in the data itself. One embodiment of the method determines a bounded area based at least in part on location information and/or location tolerances. The bounded area is compared to the location information of a test attribute in another database (possibly along with comparisons of other information) using a multi-pass algorithm to determine if the test attribute is likely to be duplicative information (e.g., within the bounded area) and, if not, allow correlations with other non-duplicative information. The method may also use location proximity, textual tolerances, confidence levels, and/or other information and factors to select the most correlatable information. The method also allows the option of elevating correlatable information to a higher level database where the information may be further correlated, displayed, or otherwise used.
A preferred embodiment uses concatenated identifiers with one or more PERL multi-pass algorithms to create new index arrays. The algorithm is used to test location and textual name information in the concatenated identifiers (and possibly to also test the numerical value of the data, hierarchy lists and/or other information) to detect duplication between data sets and identify high confidence data that can be accessed by a higher level data layer.