A portion of the disclosure of this patent document contains material, which is subject to copyright protection. This patent document may show and/or describe matter, which is or may become trade dress of the owner. The copyright and trade dress owner has no objection to the facsimile reproduction by any one of the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright and trade dress rights whatsoever.
1. Field of the Invention
The present invention relates to structure, organization and operation of database management systems (DBMSes).
2. Description of Related Art
Every organization that attempts to develop and maintain an electronic information system today is faced with a tremendous challenge. It is widely known that 90% of the world""s information is stored in the form of emails, faxes, reports and word processing documents. The remaining 10% is stored in spreadsheets and databases. The 90% portion is considered xe2x80x9cun-structuredxe2x80x9d and xe2x80x9cchaoticxe2x80x9d and cannot be easily modeled using the xe2x80x9crow and columnxe2x80x9d format of spreadsheets and databases.
However, with the invention of the Internet and its related xe2x80x9ctag languagesxe2x80x9d like HTML (Hyper-Text Markup Language) and XML (Extensible Markup language) unstructured information can now be more easily modeled because documents can be written in these new tag languages and the xe2x80x9cmission criticalxe2x80x9d data in these documents can be identified with common tags. By blending markup languages with software that can operate on the data xe2x80x9cbetween the tags,xe2x80x9d a new information type is emerging called xe2x80x9csemi-structured.xe2x80x9d Today""s DBMSes are not believed to have the capability to manage xe2x80x9csemi-structuredxe2x80x9d information.
Conventional DBMSes may be of several varieties. These include relational database management systems, SGML-based database management systems, and text-based database management systems.
Conventional relational database management systems have their benefits and drawbacks. Conventional RDBMSes are generally robust, have strong legacy connections, and are a proven technology. On the other hand, they are generally very expensive and not web-ready.
More recently, DBMSes based upon SGML have been developed. These DBMSes are generally flexible, robust, and include good publishing tools. However, they are also very-very expensive, and the skills to develop and manage them are hard to find.
Many DBMSes are based upon full text concepts. These DBMSes are generally flexible, low in cost, and utilize proven technology. However, they generally are not robust, and not web-ready.
Another type of DBMS diverged from the xe2x80x9csingle-filexe2x80x9d type of DBMS. This type of DBMS stores each record of the database in a separate HTML file. Because it lacked flexibility, however, especially in its ability to satisfy the needs of large institutions, this DBMS was inadequate.
On the other hand, the benefit of hyperlinked documents distributed across and accessible from a network has become apparent. A xe2x80x9chyperlinkxe2x80x9d is defined as a point-and-click mechanism implemented on a computer which allows a viewer to link (or jump) from one screen display to where a topic is referred, to other screen displays where more information about that topic exists. These hyperlinked screen displays can all be of portions of the media data (media data can include, e.g., text, graphics, audio, video, etc.) from a single data file, or can be portions of a plurality of different data files; these can be stored in a single location, or at a plurality of separate locations. The hyperlink is the combination of a display element or a (generally visual) indication that a hyperlink is available for a particular hyperlink source, and a computer program which finds and displays the hyperlink destination. A hyperlink thus provides a computerassisted way for a human user to efficiently jump between various locations containing information which is somehow related.
The term xe2x80x9cdocumentxe2x80x9d is defined in a broad sense as text and other information stored in a single computer file. Documents include everything from simple short text documents to large computer multi-media databases.
The Internet, and particularly the World Wide Web, has brought hyperlinking to use over networks. A network is a collection of communicatively coupled computers. The Internet is a international network comprised of many heterogeneous sub-networks which link thousands of computers which have millions of users, many of whom are authors. The World Wide Web (sometimes simply called xe2x80x9cthe Webxe2x80x9d) is an interface and communications protocol used to make the Internet easier to use.
Nearly all DBMSes are client-server oriented, and are installed and operative on an operating system in conjunction with a file system. A computer operating system represents a collection of computer programs or routines which control the execution of application programs and that may provide services such as resource allocation, scheduling, input/output control, and data management. Most operating systems store logical units of data in files, and files are typically grouped in logical units of folders. Folders are themselves files which identify the files assigned to them and a folder can store other folders. Folders are sometimes also referred to as directories. An interrelated collection of files is called a file system.
Most file systems have not only files, but also data about the files in the file system. This data typically includes time of creation, time of last access, time of last write, time of last change, file characteristics (e.g., read-only, system file, hidden file, archive file, control file), and allocation size.
Most operating systems are designed to shield applications from direct interaction with the hardware which actually store file systems. File systems typically are stored in mass storage devices. A mass storage is a device having a large storage capacity, and may be read-write (e.g., a hard disk drive) or read-only (e.g., a CD-ROM drive). Some mass storage devices, for example RAID systems, comprise a collection of mass storage devices. Mass storage devices also typically have the quality of non-volatility.
The storage space of a mass storage device is logically divided into one or more logical disks also known as partitions. Conversely, drivers are available which will treat a group of mass storage devices as a single logical disk. In Windows operating systems, each logical disk is served by a disk device driver which also holds a drive designation, C:, D:, E,. etc. Windows operating systems do not limit logical disks being part of a mass storage device. For example, a RAM disk uses part of the computer""s operating memory as a storage of its sectors.
The task of interfacing applications to the contents of a logical disk is assigned to a file system driver. A file system driver is a collection of function routines and file management structures that perform various tasks related to files and folders stored in logical disks. The function routines of a file system driver are used to open specified files, read specific blocks of data, write specific blocks of data, and close files. A file system driver is a significant portion of an operating system. File system drivers uses the services of a disk device driver to read sectors, translate sector data and give the user lists of files stored on the hard disk drive.
The structure of data stored in a logical disk is file system-dependent. For example, the FAT file system requires a logical disk to have a boot sector that describes location of File Allocation Tables (FAT) sectors and root directory sectors within this disk. Other file systems, such as NTFS, HPFS, etc. operate with different data structures and are incompatible with the structures of other file systems.
Conventional database management systems store records in one file, or in a few files in a single folder. As the amount of data to be stored in a database increases, the records are usually logically divided into more files. The location of the files is typically based upon convenience (e.g., all in the same folder) and accessibility. Accessibility usually considers speed of the underlying hardware, operating system, file system and communication paths.
In accordance with the present invention, the database consists of HTML, XML or other standard-format, hypertext documents. For each xe2x80x9crecordxe2x80x9d in the database there is a master document and multiple related documents, called xe2x80x9cview documents.xe2x80x9d The view documents are related to the master document and may have a subset of the data of the master document. The view documents are generally created at the same time as the master document. The master document and view documents are revised by the DBMS together. The view documents are based upon pre-defined templates. The view documents provide alternative views of the data in the master document, and may be tailored to the user or class of user. Thus, each xe2x80x9crecordxe2x80x9d in the database is actually one or more files. The xe2x80x9cdatabasexe2x80x9d is formed from a directory tree of these files, structured in a pre-defined and controlled manner.