A computer program listing appendix entitled xe2x80x9cAppendix to Ser. No. 09/229,593xe2x80x9d and contained on a compact disc submitted herewith, is incorporated herein by reference in its entirety. Applicant submits two compact discs, one original plus an identical copy, containing one file with the title xe2x80x9cAppendix to Ser. No. 09/229,593xe2x80x9d.
1. Field of the Invention
The present invention relates generally to methods for exchanging data between disparate data hosts including application programs and data bases. More specifically, the present invention relates to a user transparent process for exchanging and routing data representing postal address information between disparate data hosts.
2. Description of the Prior Art
Application programs and databases, including relational databases, are examples of data hosts used for generating, manipulating, and storing data. A wide variety of data hosts are commercially available for managing many different types of data for a multitude of purposes. Application programs and databases typically include strict rules for defining composite data types that may be used therein. The data types may include records, arrays and other structures.
Generally, data formats may be categorized as either plain text data, or parsed and tagged data. Plain text data is of variable length and composition and is not easily parsed into fields, and therefore there are no portions of the plain text data which are separately identifiable. Plain text data is most commonly managed in word processing type application programs. In database files, data is generally managed in a parsed and tagged type of format either by a database manager or by a special purpose application program.
Database files generally include data records and header records. In general, database files may be managed either by a database manager or by a special-purpose application program. A database manager provides for a user to specify record structures upon creation of the database file. A record structure is generally described by field names, data formats, and byte offsets or specific delimiters in the record. Database manager programs maintain data dictionary records as headers in the database file, the records typically specifying parameters associated with each field including a name, a start byte offset, and a data format. Special-purpose application programs are used to generate and manipulate databases of one specified record structure, the specification of which is embedded in the code of the program rather than in header records of the file. Currently, there is no standard internal data format used by all application programs and data base managers. Application programs and data bases typically use complex proprietary data formats.
The disparity in internal data formats between different types of application programs and database managers causes problems for users who wish to exchange data between these disparate databases. A disparity in internal data formats from one data host to another may also arise due to the use of different compilers and different hardware architectures, sometimes referred to as xe2x80x9cplatformsxe2x80x9d. application programs and data bases are written in a higher order language, and then compiled by other programs called compilers. The same or different compilers used on different computers may result in different internal data formats for the same data. Different compilers used on identical platforms may also result in different internal data formats. Another problem is that different compilers and platforms may use different byte ordering including Big-Endian and Little-Endian byte ordering.
It has become increasingly desirable for users to be able to conveniently exchange data between disparate application programs and databases running on disparate computer platforms including desk top computers, hand held computers, and web servers. Due to the disparities in the internal data formats of the various data hosts, transfer of data between disparate data hosts typically is not readily achievable via ordinary file transfer. The different internal data formats must be reconciled for disparate data hosts to communicate with each other. When information is to be exchanged between disparate data hosts, some form of data format conversion is required.
A variety of prior art techniques have been developed specifically for exchanging data between handheld computers and desk top computers. Handheld computers, such as personal digital assistants (PDA""s), typically provide some combination of personal information management functions, database functions, word processing functions, and spreadsheet functions. Due to limitations in memory size and processing power, handheld computers are generally limited in functionality and differ in data content and usage from similar applications on desktop computers. Many users of handheld computers, such as personal digital assistants (PDA""s), also own a desktop computer which may be used for application programs that manage data similar to the data stored in the handheld computer. A user typically stores the same data on the desktop computer and handheld computer. Therefore, it is very desirable for a user to be able to conveniently exchange data between desk top application programs and data bases, and memory resident data sets of a hand held computer.
Data exchange between disparate application programs is also very important in electronic commerce wherein computer systems are interconnected through computer networks of various configurations. Networked computer systems have allowed for the emergence of many different types of transactions between users operating disparate application programs running on disparate computer platforms. A recent development in the World Wide Web is the capability to send data from web clients back to a web server using fill-in xe2x80x9cformsxe2x80x9d. This enables web users to enter information such as, for example, credit card numbers and addresses for purchases made over the Internet. In the growing field of electronic commerce, many such information transactions are becoming common place of for varying purposes. A xe2x80x9cformxe2x80x9d typically includes standard graphic user interface (GUI) controls such as text boxes, check boxes, and menus. Each control is given a name that eventually becomes a variable item that a processing script uses. Text and password boxes can be used to create registration forms which include fields representing an address including a name field, a phone number field, a street address field, a city field, a state field, and a zip code field, a phone number field, an e-mail address field, and a web address field.
In accordance with one type of prior art methods for exchanging data between disparate data hosts, a user must call separate services to encode and decode basic data field types or to define messages in a separate language syntax that will be used for information exchange. These prior approaches do not provide transparent data exchange, and impose a significant translation overhead on the systems involved.
Crozier (U.S. Pat. No. 5,701,423, issued Dec. 23, 1997) discloses a computer implemented method for translating computer data from a source record structure having information arranged in a source file, to a destination record structure. Each of the source and destination record structures includes a plurality of fields, each having a name. The destination record structure differs from the source record structure in field name, field order, or one-to-many or many-to-one field correspondence. The source file exists on a first computer and the destination record structure is specified by a program for execution on a second computer. The method includes the steps of: presenting the names of the fields of each of the source and destination record structures on a display; allowing a user to interactively select a field from the source record structure and a corresponding field from the destination record structure, thereby establishing a mapping between the fields; and translating the information of the source file, which is arranged in the source record structure, into a form compatible with the destination record structure in accordance with the mapping. This method is not transparent to the user because it places a burden of defining a mapping model for data translation on the user of the data hosts.
What is needed is a process for user-transparent exchange of data between disparate data hosts running on disparate computer platforms including hand held computers, desk top computers, and web servers, wherein the process provides automatic mapping between fields of a source data host and corresponding fields of a destination data host.
What is also needed is a process for user-transparent exchange of data between disparate data hosts wherein if the internal data format of the source data host is a plain text data format, the process provides automatic parsing of the plain text data into a plurality of data portions having corresponding tags associated therewith, each of the tags indicating a type of information represented by the corresponding data portion.
Further needed is a process for user-transparent exchange of data between disparate data hosts running on disparate computer platforms, wherein the process facilitates more convenient transactions in electronic commerce.
It is an object of the present invention to provide a process for user-transparent exchange of data between disparate data hosts running on disparate computer platforms including hand held computers, desk top computers, and web servers, wherein the process provides automatic mapping between fields of a source data host and corresponding fields of a destination data host.
It is also an object of the present invention to provide a process for user-transparent exchange of data representing postal address information between disparate data hosts wherein the process provides automatic mapping between fields of a source data host and corresponding fields of a destination data host.
It is a further object of the present invention to provide a process for exchanging data representing postal address information between disparate data hosts wherein if the internal data format of the source host is a plain text data format, the process provides automatic parsing of the plain text data into a plurality of data portions having corresponding tags associated therewith, each of the tags indicating a type of information represented by the corresponding data portion.
Briefly, a presently preferred embodiment of the present invention includes a data exchange process for transferring data representing a geographical address from a source host using a source data format to a destination host using a destination data format. The process includes the steps of: using a first driver to extract a data block from the source host and to convert the format of the data block from the source data format to an intermediate data format; and temporarily storing the data block in an intermediate memory storage location; determining if the data block includes plain text data which is not parsed and identified by corresponding tags
If the data block includes plain text data which is not parsed and identified by corresponding tags, the process provides for automatically parsing the data block into a plurality of data portions having corresponding tags associated therewith, each of the tags indicating a type of information represented by the corresponding data portion. A second driver is used to convert the format of the data block from the intermediate data format to the destination data format, and to insert the data block into the destination host.
The step of automatically parsing the data block into a plurality of data portions includes the steps of: identifying a plurality of text strings of the plain text data; and comparing the text strings to a plurality of predefined patterns to determine pattern matches between the text strings and the predefined patterns. The step of identifying a plurality of text strings of the plain text data includes: assigning a line number to a plurality of text lines of the plain text data; assigning a starting position value and an ending position value to each of the text strings of each the text line of the plain text data.
In the preferred embodiment, the step of automatically parsing the data block into a plurality of parsed data portions further includes the steps of: contextually analyzing the text strings including determining positional relationships between various ones of the pattern matches; and generating a plurality of probability weights for each of the text strings based on the pattern matches and the positional relationships between the various ones of the pattern matches, each of the probability weight factors indicating a probability that the corresponding text string represents a corresponding type of information; and determining the data portions and the corresponding tags based on the pattern matches and the probability weight factors.
The step of identifying the plurality of text strings includes the steps of: reading plain text data of the data block; sorting the plain text data into a plurality of text lines; determining spaces, tabs, and punctuation marks in the plain text data; collapsing multiple spaces on each text line to a single space; and for each tab found, beginning a new text line and deleting the tab.
The probability weight factors include: name probability weights each indicating a probability that a corresponding text string represents a company name; company name probability weights each indicating a probability that a corresponding text string represents a company; address probability weights each indicating a probability that a corresponding text string represents an address; city name probability weights each indicating a probability that a corresponding text string represents a city name; zip code probability weights each indicating a probability that a corresponding text string represents a zip code; and title probability weights each indicating a probability that a corresponding text string represents a title.
An important advantage of the present invention is that a user is not required to specify a mapping between fields of the source data host and fields of the destination data host.
Another advantage of the present invention is that plain text data representing postal address information can be automatically parsed, tagged, and transferred from a source host to data fields of a destination host.
The foregoing and other objects, features, and advantages of the present invention will be apparent from the following detailed description of the preferred embodiment which makes reference to the several figures of the drawing.