Information is a predominant aspect feature of the modern world; significant resources are dedicated to obtaining it; organizing it; storing it; accessing it, etc. Indeed, much of the world's computing power is dedicated to maintaining and efficiently using information, typically stored in databases. A database is a logical collection of data, in the form of text, numbers, or encoded graphics, organized for storage as a unit, typically in a computer. Databases are commonly organized into tables that are simple row and column arrangements of related data that characterize persons, products companies, electronic mail, contact information, financial data, records of sales, performance data of processing units—anything about which data is collected. In a typical database, the rows of a table represent records, or collections of information about separate items. Each horizontal record contains one or more fields, representing individual data elements. Each vertical column of the table represents one field that is stored for each row in the table. The database records may contain any type of data and that data can be searched, accessed and modified by the user. Businesses and other organizations use databases to manage information about clients, orders, client accounts, etc.
Realizing the importance of meaningful storage of data, in 1970 Dr. E. F. Codd developed the relational model of databases based on how users perceive data and a mathematical theory of relations. The relational model represents data as two-dimensional logical entities in which each logical entity represents some real-world person, place, thing, or event about which information is collected. A relational database is a set of tables derived from logical entities and manipulated in accordance with the relational model of data. The relational database uses objects to store, manage, and access data; the basic objects in the database are tables, columns, views, indexes, constraints (relationships), and triggers. Articles by Dr. E. F. Codd throughout the 1970s and 80s such as Twelve Rules for Relational Databases and Is Your DBMS Really Relational? published in COMPUTERWORLD on Oct. 14, 1985 and Does Your DBMS Run By the Rules? published in COMPUTERWORLD on Oct. 11, 1985 are still referenced for implementation of relational databases. The twelve rules now number 333 rules and are published in The Relational Model for Database Management, Version 1 (Addison-Wesley, 1990).
A relational database stores information in tables as rows and columns of related data, and allows searches by using data in specified columns of one table to find additional data in another table. In searches of a relational database, a database server matches information from a field in one table with information in a corresponding field of another table and combines them to generate a third table that contains the requested data from the two tables. As an example of a relational database, if one database table contains the fields name, serial, address, phone and another table contains the fields serial, salary, and bonus, a relational database can match the serial fields in the two database tables to find such information as the names, and bonus of all people whose salary is above or below a certain amount. Thus, a relational database matches values in two and more tables to relate information in one table to information in another table. Computer databases are typically relational databases.
In today's world, databases systems are collections of files stored on computers that may or may not be linked to other collections or data in the same system or other linked systems such as the Internet. One or more large databases are stored on one or more servers. Users or applications called clients that may be located on that or a different server issue requests to a database server for data in the database. These requests are called search requests. A directory is one kind of database that is a set of objects with similar attributes organized in a logical and usually hierarchical arrangement. The most common example is the telephone directory having a series of names of either persons or organizations organized alphabetically, either by the name or by the services provided, with each name having an address and phone number attached. Another common example is the directory of files in a computer. For instance the main hard drive is usually given a label C:// and applications or data stored on that hard drive may given the pathway C://Music/ or C://Programs, etc. The Lightweight Directory Access Protocol (LDAP) is an application protocol for querying and modifying directory services on TCP/IP, a particular transmission and internet protocol. A LDAP directory tree often reflects various political, geographic, and/or organization boundaries in a database but usually uses the Doman Name System (DNS) names for the uppermost levels of the hierarchy. The DNS is a hierarchical naming system for computers, services, or any resource participating in the Internet that associates various data with domain names assigned to such participants. Most importantly, it translates human meaningful domain names, such as www.example.com, to the numerical (binary) identifiers, such as the IP address 208.77.188.166, associated with networking equipment for the purpose of locating and addressing these devices world-wide.
Indeed, the World Wide Web (www) is essentially a large database comprised of an expansive network of interconnected computers upon which businesses, governments, groups, and individuals throughout the world maintain inter-linked computer files known as web pages. The volume of data available on the Internet is increasing daily but the ability of users to access, understand and transform data available on the Web, let alone their own data, has not kept pace. People now have the ability to access, capture, use, manipulate and integrate data available on the Internet from multiple sources for such applications as data mining, data warehousing, global information systems, and electronic commerce. At least two problems occur now with access to so much data. One problem is that a search request to one or more databases can return so much data that the useful data may be hidden in hundred of thousands of items returned. Anyone just has to do a Google search on a common word or phrase to understand the phenomenon. A different but related problem is that there are so many requests for data to a database that the searching software can't handle it efficiently. Access to search for data on directories and/or databases may be needed on a 24/7 basis and millions of search requests can occur in a small period of time. Some of the search requests to a database may be repetitive; others may be intended to lock-up or thwart a computer system; some search requests may accidentally lock the system in an endless processing loop because of a software bug so that other search requests are not given access to the database. Another scenario is that some search requests must be given higher priority at all or at specified times.
Spamming is the abuse of electronic messaging systems to indiscriminately send unsolicited bulk messages. Spamming clients can consume all of the database server resources and possibly all of the system resources. These clients are not necessarily malicious and might simply be run away or faulty applications controlled by an administrator. Currently, there is no easy way to identify and deal with spamming client transactions until it has already happened. It is not always preferable to simply block the spam because the client is not necessarily malicious and important transactions might be inadvertently blocked. Software processes that manage the database may permit classification of requests to different enclaves and assign different priorities to these enclaves but such management still does not identify spamming search requests. Administrators of the databases have to anticipate possible spamming clients and direct a search request from a possible spamming client to a separate work group or enclave. This is unrealistic because spam cannot be determined until it has already happened. Isolating possible spamming clients in advance, moreover, would require many enclaves to be created, some of which end up being unnecessary or repetitive. Another problem with this approach is that arbitrary or global values in the search requests require the administrator to create an enclave for several searches that for all intents and purposes are equivalent. This could require the administrator to create a large number of enclaves and still not be fully covered. It might also be unrealistic or impossible for an administrator to predict what kind of spam they will receive.
One option is to give low priority to clients identified as spamming users. Other options are that an administrator may want to alter the priority of clients that are important or search transactions that take a long time to run; giving these clients and tasks a higher priority. Administrators can group together similar workloads into a smaller and more manageable set of search requests and prioritize these search requests to fit their individual needs. Storing an entry in the database server for each search request is unrealistic because of the space the list would consume would be massive and the list would be unmanageable.
What are needed in the realm of database searches is a method, a machine, and a computer program product to enable a database server to identify search requests as being possibly problematic and then treat these requests differently. Also what is needed is a method and a machine to identify those search requests that should be given higher priority because of the client or the nature of the task. Thus, what is required is a dynamic and automated machine, method, and computer program product that interrogates a search request, determines if the nature of the search request is spam, and if so then classifies and prioritizes the search request in accordance with its attributes and statistics. These needs and others that will become apparent are solved by the invention as stated below: