The present invention relates to methods for managing information in general and to the binary representation and information mining in particular.
The idea of a binary database was first introduced by Spiegler and Maayan in a seminal paper of 1985 (Spiegler, I., and Maayan, R., xe2x80x9cStorage and Retrieval Considerations of Binary Data Basesxe2x80x9d, Information Processing and Management, Vol. 21,3 pp. 233-254, 1985), hereinafter; Spiegler and Maayan.
The original binary database concept described in Spiegler and Maayan proposed a method for storage and retrieval of alphanumeric data found in files and databases as an alternative to inverted file for a storage and retrieval technique in database management.
The xe2x80x9cbinary ideaxe2x80x9d was then ahead of time. Today, the application of the binary idea in bit maps or bit vectors, have come to age with several vendors developing software to support access and retrieval to databases and data warehouses. Those developments fall short of full realization of the original binary database concept as they use bit vectors at the attribute level without linking among attributes or providing an overall binary database view.
U.S. Pat. No. 5,649,181 to French et al. describes a method for using bit vectors for indexing database columns (attributes) for the purposes of information access and retrieval. The patent was implemented in a software product called Sybase IQ, aimed for use as an on line analytical processing (OLAP) engine.
U.S. Pat. No. 5,706,495 to Chadha et al. describes the use of a vectorized index on which a series of bit-vector operations are performed for optimizing SQL queries.
Some firms apply today bit vectors in their products. For example, Sand Technologies, in a package called Nucleus, uses bit maps for improving high performance ad hoc interactive queries.
The present invention carries the binary database concept to new territories and applications, which include representation of graphs, keywords contexts, to data and text mining, knowledge discovery in databases (KDD), and up to a database on a chip. The binary/positive representation of data can be used to extract behavior patterns, characterizing consumer segments, select symptoms identifying a certain disease, support target marketing campaign, perform DNA analysis, and many more.
A recent article by Gelbard and Spiegler""s (Gelbard. R., and Spiegler, I., xe2x80x9cHempel""s Raven Paradox: A Positive Approach to Cluster Analysisxe2x80x9d, Computers and Operations Research, Vol. 27.4, April 2000), hereinafter; Gelbard and Spiegler, enhances and advances the binary database approach even more and presents a model for similarity evaluation and a method for data clustering which is based on positive attributes of data.
The present invention carries the similarity evaluation and the clustering method far ahead, by improving similarity indexing and clustering techniques
The present invention provides an innovative approach to the use of the binary data representation in the following areas:
FIG. 1 to which reference is now made shows an overview 10 of new areas and applications in which the present invention is mostly useful.
In accordance to the present invention there is provided a knowledge tool for describing a relationship pattern between objects, comprising a binary representation for an interaction between the objects, the binary representation indicates an alleged influence of an object i on an object j by assigning a positive value to an element in an ith row and a jth column of a matrix in which the objects are set in a row and column format.
In accordance to the present invention there is provided a method to evaluate quantitatively a similarity or a distinction between at least two objects, comprising the stages of: (a) representing the objects by a binary representation in which attributes of the objects are features which are relevant to the similarity; (b) calculating a similarity index between the at least two objects, the similarity index is proportional to a number of positive attributes common to the at least two objects being represented by the binary representation.
In accordance to the present invention there is provided a method for preserving a compression capability of a database comprising the stages of: (a) representing the data in the database by a binary matrix; (b) interchanging an order between rows and an order between columns of the binary matrix, as to partition said binary matrix into approximate homogeneous sub-areas containing cells of xe2x80x9c1xe2x80x9d or xe2x80x9c0xe2x80x9d only; (c) excluding said approximate homogeneous sub-areas of said binary matrix so as to get a reduced binary matrix and loading said reduced binary matrix into a data storage space; (d) symbolizing the homogeneity pattern by a tree structure, and (e) changing the root of the tree structure in order to get a required feature of said tree structure.
In accordance to the present invention there is provided a method for grouping a plurality of objects according to their similarity, the method comprises the stages of: (a) representing the objects by a binary representation matrix with positive attribute values, in which the rows being the objects and the columns consist of attributes relevant to grouping; (b) calculating an index of similarity for each pair of objects among the plurality of objects; (c) building an object similarity matrix in which an entry of the matrix element of an intersection between two objects, is the index of similarity between the two objects, and (d) scanning the similarity matrix to chose pairs of objects having the similarity index of at least a pre-selected value, each of the chosen pair of objects consist a different clustering candidates respectively.
In accordance to the present invention there is provided a method for data mining comprising the stages of: (a) defining attributes which are considered a-priori by an expert opinion to be meaningful to a score of a data mining process; (b) reading raw data from operational database system and converting the data into objects of a binary representation in a binary matrix in which columns consist of the attributes; (c) performing positive clustering of the converted data according to a similarity which is based on the attributes to obtain number of groups, and (d) executing data mining within the groups.
In accordance to the present invention there is provided a method for text mining comprising the stages of: (a) defining attributes which comprises words considered a-priori to be included in a text as an N-chain phrase; (b) reading a free form text and performing initial parsing of the text; (c) identifying and reconstructing the binary N-chain phrase, and (d) retrieving the N-chain phrases in relevant contexts.
In accordance to the present invention there is provided a method for adaptive network addressing and routing, which comprises a binary representation of a state of connectivity between two addresses.
In accordance to the present invention there is provided a tool of data management between data warehouses and on line analysis processors, the tool comprises of a multi-dimension binary representation in which the dimension of the representation equals or exceeds a three-dimensional cube.
In accordance to the present invention there is provided a method for managing database in a storage space of a computer the method comprising the stages of: (a) representing the data in the database by a binary matrix; (b) interchanging an order between rows and an order between columns in said binary matrix as to partition said binary matrix into homogeneous sub-areas containing cells of xe2x80x9c1xe2x80x9d or xe2x80x9c0xe2x80x9d only, and (c) excluding said homogeneous sub-areas of said binary matrix so as to get a reduced binary matrix and loading said reduced binary matrix into the storage space of the computer.
It is further an object of the present invention to provide a binary representation for graphs, directed graphs, trees, automata, and connections and constraints between relations, classes, and/or records.
It is yet an object of the present invention to provide binary representation for keywords, names (people, places, products), terms, acronyms, aliases and synonyms.
It is still an object of the present invention to provide a binary representation in contexts, hierarchies, hypertexts, and mutual links between contexts within the scope of web pages and unstructured texts.
It is further still an object of the present invention to provide a feature extraction technique based on the binary representation.
It is further another object of the present invention to provide pattern recognition techniques about data based on the binary representation.