The invention relates to the field of databases, in particular, to data storage and data compression, and to technology for searching and retrieving data from a data store. The invention also relates to performing householding queries on end user data.
There are many existing methods for constructing database queries, searching databases remotely, and retrieving search results. For example, U.S. Pat. No. 5,857,197 (Mullins Jan. 5, 1999) entitled xe2x80x9cSYSTEM AND METHOD FOR ACCESSING DATA STORES AS OBJECTSxe2x80x9d describes, according to the Abstract, a system and a method for accessing a data store as objects from an object application. The accessed data store can be either an object data store or a non-object (e.g. relational) data store. The system includes an object schema including meta data corresponding to a data store schema and an adapter abstraction layer. The adapter abstraction layer comprises a first adapter, and a second adapter. One embodiment of the system includes an object schema manager to create and maintain the object schema at run time. It comprises a dynamic, scalable, centrally managed, and secure method for accessing data stored in both object and non-object (e.g. relational) data stores, effecting a consistent interface to the data store regardless of its underlying structure, or a method of transport and level of security.
U.S. Pat. No. 5,787,411 (Groff et al. July 28, 1998) entitled xe2x80x9cMETHOD AND APPARATUS FOR DATABASE FILTER GENERATION BY DISPLAY SELECTIONxe2x80x9d describes, according to the Abstract, a method for selecting records from a displayed database table by generating an SQL SELECT command for filtering the displayed records in accordance with cell values highlighted by user input. A presently selected set of records from a desired table (generally referred to as a record set or record source) are displayed on the user""s display screen. The user selects particular values in cells (an intersection of a row and a column of the displayed table) by highlighting the values using the pointer device or keyboard of the computer system. Methods of the present invention then generate an SQL select (filter) command to selectively retrieve those records from the displayed records which match the user""s highlighted values. A fully highlighted cell indicates exact equality is desired by the user, a beginning portion highlighted indicates that the user wishes to match records whose corresponding column starts with the highlighted value, and ending portion selection matches the ending portion of qualified records, and a middle portion highlighted matches any record containing the highlighted value. Values highlighted in the same row generate logically AND""d clauses in the SELECT command while the comparison generated for a row are logically OR""d with the comparisons generated for other rows. The user may indicate that the highlighted values are for selection (inclusion of qualified records) or for exclusion selection (exclusion of qualified records). A new select (filter) command may be logically AND""d with the prior filter to permit complex selection criteria to be defined by simple graphical user inputs.
U.S. Pat. No. 5,864,844 (James et al. Jan. 26, 1999) entitled xe2x80x9cSYSTEM AND METHOD FOR ENHANCING A USER INTERFACE WITH A COMPUTER BASED TRAINING TOOLxe2x80x9d describes, according to the Abstract, a method for enhancing a user interface with a computer based training tool comprising the steps of listing domain objects on a display; listing domain object values in response to a selection of one of the domain objects; generating a plurality of inquiries in response to a user selection of one of the domain object values; replying with a predetermined answer; identifying a new domain object value in the predetermined answer; and adding to the plurality of inquiries a new inquiry which incorporates both the selected domain object value and the new domain object value.
U.S. Pat. No. 5,787,412 (Bosch et al. Jul. 28, 1998) entitled xe2x80x9cOBJECT ORIENTED DATA ACCESS AND ANALYSIS SYSTEMxe2x80x9d describes, according to the Abstract, a system for accessing and analyzing data through a central processing unit. The system includes a non-modal user interface to provide a user access to the system. A number of application graphics objects allow the user to visually interact with a plurality of analysis objects through the non-modal user interface. The plurality of application analysis objects allow a user to interactively create an analysis network for analyzing one or more databases. A plurality of application data access objects automatically interprets the analysis network and allows the system to access required databases and to generate structure query language required to access and analyze the databases as defined within the analysis network.
U.S. Pat. No. 5,787,425 (Bigus Jul. 28, 1998) entitled xe2x80x9cOBJECT-ORIENTED DATA MINING FRAMEWORK MECHANISMxe2x80x9d describes, according to the Abstract, an object oriented framework for data mining operates upon a selected data source and produces a result file. Certain core functions are performed by the framework, which interact with the extensible function. This separation of core and extensible functions allows the separation of the specific processing sequence and requirement of a specific data mining operation from the common attribute of all data mining operations. The user may thus define extensible functions that allow the framework to perform new data mining operations without the framework having the knowledge of the specific processing required by those operations.
U.S. Pat. No. 5,761,663 (Lagarde et al. Jun. 2, 1998) entitled xe2x80x9cMETHOD FOR DISTRIBUTED TASK FULFILLMENT OF WEB BROWSER REQUESTSxe2x80x9d describes, according to the Abstract, a World Wide Web browser which makes requests to web servers on a network which receive and fulfill requests as an agent of the browser client, organizing distributed sub-agents as distributed integration solution (DIS) servers on an intranet network supporting the web server which also has an access agent servers accessible over the Internet. DIS servers execute selected capsule objects which perform programmable functions upon a received command from a web server control program agent for retrieving, from a database gateway coupled to a plurality of database resources upon a single request made from a Hypertext document, requested information from multiple data bases located at different types of databases geographically dispersed, performing calculations, formatting, and other services prior to reporting to the web browser or to other locations, in a selected format, as in a display, fax, printer, and to customer installations or to TV video subscribers, with account tracking.
However, there is a need for a substantial improvement in the efficiency of several data-intensive industries. For example, the credit industry currently requires an average of at least thirty days to respond to a customer request for information necessary to execute a pre-approved credit card mailing. Typically, a customer wishing to execute such a mailing will send the credit company a sample of the types of households to which the customer would like to send its mailing, which sample includes records containing specific addresses, account numbers, household incomes, and other household information. Because of the size of the sample, the records are generally sent on magnetic tape, which, as is well known, entails handling, transportation and storage costs and delays inherent in physical transportation. The customer also typically sends specifications for supplementing this sample data with data from the credit company data store in order to compile a complete mailing list for the offer.
The specifications might request that the data store identify 20,000 households in the same zip codes and income ranges as the households provided in the sample. The specifications might also request that once these households are identified, the data store run its own logic on this data to determine the creditworthiness of each household contained therein (a xe2x80x9cFICA scorexe2x80x9d) and return only those households with a FICA score of an acceptable level.
Finally, the specifications will specify the layout in which the customer would like to receive the finished report. Customer specifications may be significantly more complex than this example which has been simplified for the purposes of explanation.
Upon receiving the request, the credit company must allocate resources to analyze customer specifications and ensure that it correctly understands the customer""s needs and that the data that has been provided. Miscommunication between the credit company and the customer can lead to costly errors and reruns, particularly as requests become more sophisticated.
Once the credit company believes that it understands the specifications, it will often be required to write custom software code in order to standardize the records provided by the customer (which may contain a variety of record formats and types) so that they will interface with the data store""s standard processing operation. Preparing custom code for each customer requires large numbers of programmers and time consuming data analysis, as can readily be appreciated.
Upon completion of the analysis, the credit company will run its proprietary FICA score logic on the results and may run xe2x80x9chouseholdingxe2x80x9d logic to show relationships among different individuals sharing the same address, or to identify and eliminate redundant information. Large portions of the householding process are generally xe2x80x9chard codedxe2x80x9d in advance, making it difficult to adjust householding parsing and matching parameters to the particular data received or the end user""s particular needs. As a result, data may be mismatched and relationships among records may be undiscovered. Furthermore, householding large files often takes days or weeks to complete because large files and cumbersome parsing and matching procedures slow the process and prevent the efficient use of system memory.
Once processing is completed, custom code is then written to provide the results in the customer""s preferred layout. Upon completion of the process, the credit company will likely return the results on magnetic tape because the volume of the data prohibits electronic transmission. This entire process may take one month or more, excluding reruns if there are miscommunications or mistakes at any stage of the process.
Therefore, a need exists for improvements in the area of database construction, queries, householding, and remote searching.
It is, therefore, a principle object of this invention to provide a method and apparatus for storing data as objects, constructing customized data retrieval and data processing requests, and performing householding queries.
It is another object of the invention to provide a method and apparatus that solves the above mentioned problems so that database efficiency can be improved.
These and other objects of the present invention are accomplished by the method and apparatus disclosed herein.
Advantageously, according to an aspect of the invention, a universal data object (UDO) is provided. A universal data object is a combination of data and function/logical processing instructions used to perform a particular database search request and to execute customized processing instructions on the results of that request. Each universal data object consists of two distinct sections: (i) one or more data object headers and (ii) raw data. The universal data object is compressed at practically all stages of operation. The universal data object may include, but is not limited to, extended binary coded decimal interchange code (EBCDICxe2x80x94an IBM(copyright) system 360/370, 256 character code with 11 bits per character) and variable length files put in a standardized format, and an index of the compressed data. The invention provides for automated extraction of data object headers and capture of data profile statistics.
According to an aspect of the invention, the user builds a fully executable processing request, not mere specifications for a request.
According to an aspect of the invention, a universal data object carries large amounts of data married to processing instructions.
According to one aspect of the invention, the universal data object remains compressed at nearly all times.
According to an aspect of the invention, compressed universal data objects are stored in a flexible data store and are accessible to remote end users for data retrieval and processing.
According to an aspect of the invention, during compression, the data store is optimized for efficient data storage and retrieval by capturing data profile statistics and creating an index of the compressed data.
According to an aspect of the invention, descriptive information about universal data objects and processing logic is displayed and made available for viewing and access by remote end users having access to a client application.
According to an aspect of the invention, the client application manipulates the descriptive information to create a customized data retrieval and processing request.
According to an aspect of the invention, an end user can add descriptive information about its own data to the request and attach its own compressed data to the request so that it will be included in data retrieval and processing.
According to an aspect of the invention, descriptive information representing data and logic stored in multiple remote data stores can be integrated into one request.
According to an aspect of the invention, the request can be submitted to several data stores in succession for automated data retrieval and processing without jeopardizing request security.
According to an aspect of the invention, the final, customized request object, therefore, may contain: (i) descriptive information identifying data; (ii) descriptive information identifying logic to be executed on the data; and (iii) compressed end user data.
According to an aspect of the invention, the final request is submitted to the appropriate data store(s) and/or vendor(s) for automated processing in accordance with the customized instructions, and the data result set is returned in a format specified by the end user.
According to an aspect of the invention, the contents of the data store and the request can be almost completely processed entirely in their compressed form.
According to an aspect of the invention, a method of performing householding queries comparing input data with reference lists to produce data output is described, the method comprising building input patterns from data input submitted by an end user, displaying the input data for viewing by an end user utilizing a graphical user interface (GUI), allowing the end user to construct an output table containing output table symbols, mapping the input patterns to the output table symbols, mapping the output table symbols to a reference list, generating search patterns derived from the mapping of input patterns to the output table symbols that are mapped to the reference list, and using the search patterns to parse the data input against the reference lists to produce data output.
According to another aspect of the invention, end users have broad flexibility in customizing their householding parameters, including end-user specifications to improve the flexibility, speed and efficiency of the householding process. End user householding specifications are entered through the invention""s GUI, which allows end users to review summary information about their data and specify the data parsing (i.e., data cleansing and classifying) procedures that they wish the invention to perform. End users can access and use the GUI to tailor householding parameters to accommodate their parsing requirements. For example, an end user can tailor parsing parameters to ensure that non-traditional data inputs, such as data from different countries or different systems, are correctly parsed.
According to another aspect of the invention, householding speed and efficiency are improved by using the information entered by the end user to prioritize parsing operations and eliminate unnecessary processing. Specialized, indexed parsing tables are created in memory to further increase the speed of the parsing process. These parsing tables are designed to maximize use of system memory, thus eliminating extensive I/O exchanges that slow traditional householding processes.
Advantageously, the present invention automates the existing process to a degree that was previously impossible. Data stored in the credit company data store, such as was described above, and is transformed into standardized, compressed data objects according to an aspect of the invention.
According to an aspect of the invention, the universal data object is optimized for data storage and retrieval with methods for automatically capturing and displaying data profile statistics and storing data in indexed, compressed blocks that facilitate rapid data retrieval.
According to an aspect of the invention, the credit company data store is provided with a method for creating an electronic display of its data, data profile statistics, and data processing logic, which display is made available to customers.
According to an aspect of the invention, a customer accesses this displayed information and uses a GUI that generates an application programming language (xe2x80x9cAPLxe2x80x9d) to construct an executable request for data retrieval and data processing from this displayed information.
According to an aspect of the invention, information about the customer""s data can be integrated into the request, and the customer""s own compressed data can be attached to the request so that it will be automatically processed at the data store.
According to an aspect of the invention, the request (including customer data, if any) is submitted to the data store electronically, and the request is executed in accordance with the APL instructions.
According to an aspect of the invention, once the customer builds and sends the request, the entire process is automated and performed on compressed data, with almost no expanding of data at any time during the process, unless it is passed to an external C program, householding, or other program that requires uncompressed data.
According to an aspect of the invention, a compressed result set is automatically returned to the customer in a layout specified by the customer on the GUI.
Thus, the invention dramatically improves the efficiency of the current process. Advantageously,because the customer can review the entire range of data and logic options made available by the data store and construct its own request, the data store does not need to interpret customer specifications or take responsibility for miscommunications. Instead, the customer can visit the credit company""s Internet site and choose the household zip codes and income categories necessary for its request by double-clicking on the credit company display.
The customer can also double-click on available logic, such as FICA score, and use the GUI to specify that this logic will be executed on the result set of the data request.
The customer has the option to specify that its own compressed data should be included in the request for processing or to be householded with data store data to identify relationships and eliminate redundant information.
Once the executable request is completed, the request is electronically transmitted to the data store and the entire process is automated. Because the universal data objects and the customer data are already maintained in a standardized object format (e.g., with data object headers), the APL can read and manipulate specified data and logic without additional intervention, and the customer data will be compatible with the data store data for householding or other purposes.
The data is returned in a layout specified by the end user on the GUI. Thus, advantageously, data object standardization and layout specification eliminate the need for programmers to write code to submit customer data to the credit companys standard process and to prepare the report layout.
Unlike the current process, the process according to an exemplary embodiment of the invention, is performed with almost no expanding of data at any stage (unless it is passed to an external program that requires uncompressed data), thus saving valuable storage space. Advantageously, the indexing of the data store data enhances the efficiency of data retrieval and reduces the run time for the entire process.
By contrast, ordinary data stores cannot implement a similar process with currently-available technology.
The efficiencies described herein are achieved according to an embodiment of the invention, with an integrated system comprised of (i) compressed, universal data objects (at the data store and optionally at the customer) and (ii) a consistent APL to access and process data display information and universal data objects at both the data store and the end user site; and (iii) an efficient compression method that can process customer and data store data with almost no uncompressing unless data is passed out of the system or householded according to the end user""s instructions.
Advantageously, the integration of data objects, data displays, the APL and the compression method into a coherent system make it possible to automate data exchange, data retrieval and data processing to an extent previously impossible.
In contrast with current products, the inventions allows the end user to control householding specifications directly and to tailor householding parameters according to the end user""s data input and desired output. Through use of a GUI, the end user can manipulate data input and parsing and matching criteria to produce a highly tailored, householded output, without intervention by a third party and without submitting data to a standardized xe2x80x9cblack boxxe2x80x9d process.
The invention improves the speed and efficiency of the householding process by using the end user""s householding specifications to prioritize parsing operations and eliminate unnecessary processing. The invention also uses the end user""s specifications to reduce the amount of data against which the end user""s data is parsed by creating smaller parsing tables with indices that allow rapid data lookup and retrieval. The invention maximizes the use of system memory, thus eliminating extensive I/O exchanges that slow traditional householding processes.
It should be kept in mind that the process set forth above is illustrative of one exemplary application of the invention, but there are many other possible applications. In general, the invention has the potential to generate efficiencies wherever end users access remote data stores or seek to household different records.
As noted above, so long as data stores store their data in the prescribed object form, it is possible to submit a request to multiple data stores, so that data might be obtained from one data store and automatically submitted to processing logic obtained at another data store, or a request for data retrieval might be constructed using information from several data stores and then sent seriatim to each data store to retrieve all relevant information and return the results to the end user.
These and other aspects of the invention will become apparent from the detailed description set forth below.