Database programs are one of the most widely used and useful applications of computers. Data records may be stored in database tables that are linked to one another in a relational database. Queries from users allow database programs to locate matching records and display them to users for modification. Often a large number of users access different records in a database simultaneously.
Database records are typically stored on rotating hard disks. Computer hard-disk technology and the resulting storage densities have grown rapidly. Despite a substantial increase in storage requirements, disk-drive storage densities have been able to keep up. Disk performance, however, has not been able to keep up. Access time and rotational speed of disks, key performance parameters in database applications, have only improved incrementally in the last 10 years.
Web sites on the Internet may link to vast amounts of data in a database, and large web server farms may host many web sites. Storage Area Networks (SANs) are widely used as a centralized data store. Another widespread storage technology is Network Attached Storage (NAS). These disk-based technologies are now widely deployed but consume substantial amounts of power and can become a central-resource bottleneck. The recent rise in energy costs makes further expansion of these disk-based server farms undesirable. Newer, lower-power technologies are desirable.
FIG. 1 highlights a prior-art bottleneck problem with a distributed web-based database server. A large number of users access data in database 16 through servers 12 via web 10. Web 10 can be the Internet, a local Intranet, or other network. As the number of users accessing database 16 increases, additional servers 12 may be added to handle the increased workload. However, database 16 is accessible only through database server 14. The many requests to read or write data in database 16 must funnel through database server 14, creating a bottleneck that can limit performance.
FIG. 2 highlights a coherency problem when a database is replicated to reduce bottlenecks. Replicating database 16 by creating a second database 16′ that is accessible through second database server 14′ can reduce the bottleneck problem by servicing read queries. However, a new coherency problem is created with any updates to the database. One user may write a data record on database 16, while a second user reads a copy of that same record on second database 16′. Does the second user read the old record or the new record? How does the copy of the record on second database 16′ get updated? Complex distributed database software or a sophisticated scalable clustered hardware platform is needed to ensure coherency of replicated data accessible by multiple servers.
Adding second database 16′ increases the power consumption, since a second set of disks must be rotated and cooled. Operating the motors to physically spin the hard disks and run fans and air conditioners to cool them requires a substantially large amount of power.
It has been estimated (by J. Koomey of Stanford University) that aggregate electricity use for servers doubled from 2000 to 2005 both in the U.S. and worldwide. Total power for servers and the required auxiliary infrastructure represented about 1.2% of total US electricity consumption in 2005. As the Internet and its data storage requirements seem to increase exponentially, these power costs will ominously increase.
Flash memory has replaced floppy disks for personal data transport. Many small key-chain flash devices are available that can each store a few GB of data. Flash storage may also be used for data backup and some other specialized applications. Flash memory uses much less power than rotating hard disks, but the different interfacing requirements of flash have limited its use in large server farms. Flash memory's random-access bandwidth and latency are orders of magnitude better than rotating disks, but the slow write time of flash memory relative to its read time complicates the coherency problem of distributed databases.
Balancing workloads among the servers is also problematic. Database server 14 may become busy processing a particularly slow or difficult user query. Incoming user queries could be assigned in a round-robin fashion among database servers 14, 14′, but then half of the incoming queries would back up behind the slow query in database server 14.
Improvements in cost, performance, and reliability in data processing systems are made possible by flash memory and other high speed, high density, solid-state storage devices. These improvements are of limited benefit in some scalable cluster systems where data must be partitioned across multiple processing nodes and locally accessed, or placed on a dedicated Storage Area Network, or shared through application inter-process communication. The overhead involved in these existing techniques consumes much of the performance and cost advantage inherent in high density solid-state memory.