The present invention relates generally to the field of automated entity, data processing, system control, and data communications, and provides a system for providing computer-accessible benefits for communities of users, and can efficiently and robustly distribute the processing in behalf of those users over a decentralized network of computers. The field of the invention generally encompasses enabling appropriate and desired communication among communities of users and organizations, and providing information, goods, services, a works, opportunities, and connections among users and organizations.
COPO is a system that provides information discovery and dissemination with both explicit and tacit collaboration. The portal provides automated information management and organization to assist a user, such as a pharmaceutical scientist, a financial analyst, or an news reporter, within an organization in their daily work. The portal is supported by a network of communities of practice, also called communities of interest (COI) for users to evolve and use new and existing knowledge, including expertise, current know-how and provenance. Additionally, COPO is useful outside of the typical activities of knowledge workers. The current invention has manifold uses in technical and scientific fields, and enables research, information dissemination, joint development, intellectual property management, and the identification of potential collaborators in fields including computer science, mathematics, logic, linguistics, biology, chemistry, physics, health, medicine, pharmaceuticals, education, materials science, earth science, ecology, geology, oceanography. The invention also provides support of government activities including intelligence gathering, military management, legislative, executive, and judicial decision support, and in performing search and analysis of intellectual property including patents, trademarks and copyrights. COPO can also be used in commerce, to inform buyers and sellers of joint opportunities, in investment, to bring investment partners together and to provide initial contacts and intermediation between equity holders and equity investors, between borrowers and lenders, and between options writers and options purchasers, to perform market intelligence to inform market parties about the choices, needs, and requirements of other market parties, and to provide collaborative ranking of alternative choices, offers, offerors, and customers in the marketplace. The instant invention also provides valuable capabilities supporting joint development of intellectual property science an technology, especially in areas requiring the integration of ideas from multiple individuals, including development of experimental procedures, engineering design and testing, system development, testing, documentation, maintenance, and support, including those activities for software, hardware, mechanical, hydraulic, nuclear, agricultural, oceanic, and geophysical systems. Additionally, the invention is useful in the arts and in human relations, and supports social networking for personal enjoyment, edification, or entertainment; identification of like-minded people, identification of people with complementary attributes or interests; identification of potential friends or adversaries; identification of suitable roles; and identification of people or system providing appropriate emotional support.
COPO is particularly valuable to specialist communities facing complex problems, in that it provides connections among the appropriate members of those communities, provides both passive and active information flow along those connections, and provides a persistent dynamic representation of the state of the art for a particular domain. The complexity of scientific and technological development drives practitioners to be ever-more specialized, which results in a smaller and smaller field that understands the challenges accomplishments within a particular are of expertise. Because of this, it becomes ever-more difficult to find colleagues for technical endeavors, either because the knowledge gulf is too wide for useful communication among the most appropriate known colleagues, or because the areas of knowledge and the areas of need do not match up. This problem extends beyond that of finding computer accessible colleagues, to finding any computer accessible benefits of interest to a practitioner. Additionally, the problem of finding appropriate computer accessible benefits grows steadily with new available networked resources, including people, systems, commercial entities, organizations, governments, social groups, and combinations thereof proliferate. For all respondents, including generalized actors, collaborative problem solving, and social innovation offers great potential benefit if it can be achieved practically. COPO offers such a capability, and additionally serves as an honest broker throughout the interactions by contributing classification, domain analysis, novelty analysis, time stamping, authentication, non-repudiation, connections to other trusted third parties, and confidential exchange of information.
For users of networked information systems, rapid acquisition and organization of relevant information and services is a crucial, time-consuming challenge. Health workers, scientists, news people, intelligence personnel and financial analysts, and engineers all have need of a system that helps them quickly acquire and organize a body of knowledge relevant to some new contagion, discovery, event, threat, or opportunity. COPO combines new technologies from the area of statistical natural language processing with social computing and multi-agent systems to provide unprecedented capabilities for information workers. Typically, when an analysts obtain a new task, he uses a variety of online resources to gain an initial understanding of the relevant background information, and to learn of appropriate sources, literature, experts, and services related to their goals. Such researchers usually start with enough background information to launch initial queries of search engines and directories, acquire an first set of documents, and then digest that material to become sufficiently knowledgeable to conduct a series of ever-more targeted searches. This process may require 15 minutes for trivial fact-finding, up to days and weeks for in-depth investigations. Additionally, this may be a recurrent or continual task for users such as epidemiologists, intelligence analysts, and scientists who continually survey a particular information domain.
Science, technology, and networked communication make new areas of specialized knowledge available at an accelerating pace. This situation guarantees that even the most persistent and capable analysts cannot stay abreast of the latest developments, and lack even the latest terminology and concepts used in the most active corners of their field. To fully realize the potential of our networked knowledge, such users must be able to exploit the organization and knowledge that are intrinsic in multiple artifacts and services available to them. Terminology-, concept-, and source-discovery must proceed rapidly and automatically, given the smallest hints of user context. There is a need, not satisfied by indexes and directories, for rapid, in-depth retrieval and organization of knowledge.
Almost all large-scale information services available to the user employ a “bag-of-terms” approach to information analysis. That is, they represent a document by an non-ordered collection of the terms contained by that document. Typical of these approaches are word vector representations used in pair-wise comparisons, and indices such as Google and Yahoo, that map each term to the set of documents containing the term, and organize the results by (some estimate of) the popularity of the related documents. Neither word-vectors, nor comprehensive indices are effective at extracting domain terminology, and thus, they are ill-suited to inquiries concerning new fields of study. The important terminology, idioms, and related features can be extracted from documents, but existing approaches require exponentially explosive amounts of computation and storage. For instance, extracting 5-tuples with a vocabulary of 20,000 words (very meager for most domains) results in term vectors containing 3.2 E21 potential items. Even using sparse representations (to save space), constructing and comparing these vectors is computationally expensive. The current invention uses a new approach for extracting tuples likely to contain meaningful terminology, and is able to extract such tuples in linear time with respect to the input, and to store them in linear space with respect to the size of the input. This provides a new and practical scheme for providing the initial phase of terminology extraction, concept identification, and artifact identification. The method employs a flexible spanning criterion, which recognizes tuples of varying sizes, and employs a canonical method to build a concise, easily compared representation of unstructured text. Beyond extraction of important domain terminology, the current method is also valuable in construction of information retrieval systems, information routers, and in efficiently finding partial or total duplicates of unstructured text within large bodies of information.
Current Computing Technologies
Microprocessors are central processing units (CPUs) manufactured on a small number of integrated circuits (ICs), historically one cpu per IC. Overall, smaller CPU sizes translate into faster switching times, lower power consumption, and less generated heat. Additionally, as the ability to construct exceedingly small transistors on an IC has increased, the complexity and number of transistors in a single CPU has increased dramatically. This widely observed trend is described by Moore's law, which has proven to be a fairly accurate predictor of the growth of CPU (and other IC) complexity to date. Multicore microprocessors typically combine two or more independent microprocessors on a single IC or single chip, providing higher amounts of throughput at a given clock speed. This recent are of innovation helps drive down the computational energy losses due to generated heat, and uses power more efficiently than previous technologies. For instance, quad-core devices contain four independent microprocessors. Multicore microprocessors allow a computing device perform processor-level parallelism or thread-level parallelism within a single physical package. Massively Parallel Processing (MPP) is a type of parallel computing where many processing elements including both memory and processing elements are used to work together on a software application. Grid computing is an arrangement in which multiple independent networks of computing devices act as a virtual super computer. Cloud computing is a special case of grid computing in which computation is offered as a service and accomplished by allocation of systems from a provisioning pool.
Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster. The idea is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination. Parallel computing approaches fall in a spectrum spanning “small-grain” to “large grain” parallelism. Typically, systems using “small-grain” approaches must run on specialized processors, or processors connected by very fast (and very expensive) switches. On the other hand, systems employing “large grain” parallel processing can run on much less specialized architectures, including clusters of commodity computers, and compute nodes connected over the internet. COPO/Galaxy uses a “large grain” parallel processing approach, and can be run efficiently on a wide variety of computing systems, including multicore processor systems, massively parallel systems, cloud computing systems, grid computing systems, desktops, laptops, notebooks, single-board computers, handheld computers, and embedded computers.
Load balancing is a system design approach that attempts to keep all processors busy by moving tasks from heavily loaded processors to less loaded ones. The current invention can usefully distribute processing tasks over any conventional configuration general purpose computing devices. Note that inter-process communication, as used by the present invention, can be implemented on a wide variety of hardware and software layers, including buses, hubs, switches, cross-bar switches, Uniform Memory Access (UMA) Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA), Cache only memory architecture (COMA), and combinations thereof.