In conventional usage, a hadron is a nuclear-physics term introduced by nuclear physicist Lev B. Okun in 1962. In particle physics a hadron is a composite particle made of quarks. The present invention, in contrast, uses the term “hadron” in a different context, namely, to refer to a specific data structure and its architecture for data storage.
Data storage is at a seemingly impassable crossroads: more than ever before, applications are dependant on performance, protection and availability of data; at the same time the diversity of data sets, infrastructures and topologies create an endless array of configurations for those applications to connect with. Attempts to solve this problem have forced different applications to adhere to different proprietary data structures making interfaces to data storage more and more complex, while trying not to compromise application functionality. Complicating matters further, the very dynamic nature of contemporary data requirements asks for considerable investment, maintenance and support of already implemented data structures. This results in a difficult choice: one could chose to fight the “windmill” of proprietary data structures and suffer the limitations of current storage capabilities; or one could chose to use accepted storage back-end standards and face additional work to address a “one-size-fits-all” challenge.
The entity-attribute-value model (EAV) is a data model to describe entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. In mathematics, this model is known as a sparse matrix. EAV is also known as an object-attribute-value model, vertical database model, and open schema.
U.S. Pat. No. 7,016,900 was filed Jun. 29, 2001, issued to Boris Gelfand on Mar. 21, 2006, and is titled “Data cells and data cell generations,” U.S. Pat. No. 7,822,784 is a divisional of U.S. Pat. No. 7,016,900, and both are incorporated herein by reference. U.S. Pat. No. 7,016,900 described an OEAV data model with data cells containing an entity identifier (“O”), an entity type (“E”) an attribute type (“A”), and an attribute value (“V”). Cells with identical O and E values constitute a cell set. Pairs of synapse cells relate cell sets, each synapse cell having O and E values of one cell set and A and V values equal to the E and O values of the other cell set. Cell generations store information about attributes, entities, relationships, constraints, and default data formats in the same cell listing as the cells containing the actual real-world data. Thus, data in a data cell can be considered self-identifying. Gelfand also described a way to normalize data using data pool cell sets. The data cells themselves can be stored in multiple, co-existing storage trees that are specialized for increased data query efficiency.
U.S. Pat. No. 7,200,600 was also filed Jun. 29, 2001, issued to Gelfand on Apr. 3, 2007, and is titled “Data cells, and a system and method for accessing data in a data cell,” U.S. Pat. No. 7,783,675 is a divisional of U.S. Pat. No. 7,200,600, and both are incorporated herein by reference. Gelfand described a method and system for storing data in data cells containing only a single element of data. Here again, each data cell includes four components: “0,” “E,” “A,” and “V”. Every cell contained a unique combination of O, E, A, and V. Relationships between cell sets were established by creating two synapse cells. The first synapse cell has O and E values of the first cell and has A and V values equal to the E and O value, respectively, of the second cell. The second synapse cell, has O and E values of the second cell, and has as its A and V values the E and O value, respectively, of the first cell set. U.S. Pat. Nos. 7,016,900 and 7,200,600 claimed priority to U.S. Provisional Patent Application 60/215,447 filed on Jun. 30, 2000, which is incorporated herein by reference.
The OEAV data model defined by U.S. Pat. Nos. 7,016,900, 7,200,600, 7,783,675, 7,822,784 has the following deficiencies:                1. The OEAV data model is restricted to an entity-attribute (E-A) definition format. The cell-set can only embrace cells which belong to one and only one E. Real-life data does not follow this format.        2. The so-called cell-generations imply the generation hierarchy that does not allow having one dataset include a definition of another dataset as data, which can restrict, and make difficult, its use for most complicated data structures other than a tabular format.        3. The so-called values pool is presented as a regular set of tables, which impairs the system implementation and, in fact, may negate some of the efficiencies of the system.        4. So-called synopses between cell-sets are two-way links, which can create more links cells than data cells and slow the system. Example: One company has 1000 employees. Each employee has one link to the department, but the company has 1000 links to employees.        5. Since the table record is disassembled into cells, reassembly of the record can be slow.        6. It is not possible to address the relationship between a cell-set and a cell without creating another, segregated, cell-set, containing one cell only with back-links, which slows system performance.        7. As it relates to Sybase—IQ, Vertica, Illuminate Solutions and Entity-attribute-value model (EAV), these conventional products use a columnar representation of the relational model. Every column is implemented usually in a form of B−Tree or B+Tree indices. The products are implemented for SQL-based front-end products and have not deviated from a relational model. In fact, all the metadata is stored in conventional data tables. The products do not store and maintain any other structures except relational tables.        
The following overview of disk topology and disk operations is very general and serves only one purpose: i.e., explanation of how hadron data storage works with conventional disk technology. Any information on computer disk is stored in disk blocks (sectors), which are the units in which data is stored and retrieved on a fixed-block architecture disk. Disk blocks are of fixed usable size and are often numbered consecutively using disk block numbers. Generally, each disk block (sector) has the same size: 512 eight-bit words. Lately (starting about 2011), all major hard disk drive manufacturers began releasing hard disk drive platforms using the “Advanced Format” of 4096-byte logical blocks and stronger error correction.
When the operation system is installed on a computer with raw disks or when a new raw hard disk is connected to a computer, the process of disk formatting is to be executed. Disk formatting is the process of preparing a hard disk drive for data storage. The final result of disk formatting is a map, which is basically a list of blocks with logical block address (LBA), which typically is simply a number between 0 and N−1, where N is the total number of blocks in that disk drive. In the computer that uses a disk drive to store and retrieve data, the operating system uses a file system that provides a directory of files, file names and the associated LBAs, and other metadata. Typical operating systems use dynamic allocation, which allocates space (adds or subtracts LBAs) to a file in portions as needed.
Some shortcomings of the above-described background information are presented below. What are needed are a better data-storage model, architecture, query language and implementation.