In the past quarter century since the high level computer languages were invented, the level of complexity and manual labor required to build software has not reduced significantly. Software has automated many industries by building that software ironically in a labor intensive way. In some ways the complexity of building software and the number of skill sets required to do it have increased over the last 20 years. Raw computing power has been increasing at the rate of Moore's law but most of it is not being used to make computers easy to program and maintain.
New standards and tools are proliferating, so much that even full time experts in the software industry are having a hard time to keep up with all the disparate technologies that appear regularly. The point is that today it is not easy for non technical persons to make computers “work” for them. They have to “work” hard with many people to make computers do what they want or try to make many disparate pieces of software work together themselves. This is the case even if they just want to understand what their application does end to end or rely on potentially outdated or incomplete documentation.
This evolution led us to the main problems in software industry today which are:
Problem 1: Communication Gaps—Requirements and Code are Out of Sync
Inspite of the high level languages, rule engines, Software engineering, project management best practices and tools, the biggest root causes of problems in the software industry remain to be “poor requirements” and “poor configuration management” practices. In large enterprises today, the business side of the house thinks the IT side does not understand their problems and the IT side complains that they get poor requirements. The reality is that there are too many players and hand-offs in this chain, which is only as strong as its weakest link. Bottom line? Poor communication. (Period) No amount of current tools and best practices can completely fix this human communication problem. Anybody who has spent some time in corporate America does not need an analyst study to imagine how much money, effort, time and opportunities are being wasted due to this problem. Most of the software projects are late and do not deliver completely on the requirements.
This problem technically manifests itself as (requirements) documentation and code that are out of sync. This problem and its consequences are articulated more eloquently by many experts at http://www.literateprogramming.com/quotes_sa.html (last accessed Feb. 1, 2007—also on attached CD) This problem is further exasperated by the proliferation of tools, representation formats and standards, which means that to understand the end-to-end behavior of any particular feature in a reasonably non-trivial enterprise system, many different skill sets and tools are needed. Hence it is high time to question the status quo, revisit the basics and reshape the software development paradigm.
Problem 2: Software Interoperability (how to Make Software Additive?)
A related, but different, perspective is the question of how to make knowledge additive? How can we combine the knowledge in two human brains or two organizations and benefit from the sum? Today, this is a huge problem in enterprise Databases systems/software while usefully merging/de-duping data because of referential integrity and chronological integrity, not to speak of schema mismatch issues. Another manifestation of this problem is to get two software programs written at different times by different programmers to work together, even if they are in the same programming language. This again is a pain area in enterprises during integration of IT investments by consolidating or reusing software.
Let us assume someday we can create a ‘brain automata’ which ‘behaves’ exactly like a human brain. We know the brain stores accumulated knowledge by learning over a long period of time. Does this mean that, unless all ‘brains’ store the knowledge in the same ‘schema’ (which humans might not, depending on how, when and where they ‘learned’ the info), we might not be able to add up/use the knowledge? Factor in the security (permissions) and this is an interesting problem to work with because it could be worked at many planes, right from a database perspective to solve today's problems, to the human brain perspective to solve tomorrow's problems. Further research into this could even help design the ‘brain automata’ in the first place.
Note: While this effort does not attempt to understand how knowledge is stored in the brain, it focuses on providing tools to utilize the same constructs that humans use to communicate knowledge amongst them as opposed to a more artificial representation that is closer to machines. The inventor's thought process along these lines for the last 20 years led to this solution.
The Real Underlying Problem:
The above two problems have a commonality. The real communication challenge here is not between human and system or between system and system, but really between humans (the users, sponsors, analysts, programmers, designers, etc.). In the structured knowledge space, prior art has mostly focused on                Communication between humans and computers (high level programming languages, rules engines, semantic web etc) keeping the abstraction and representation structures of the knowledge closer to systems which are not natural to humans, hence needing much translation (design and coding) by programmers, analysts etc. Since they were closer to machines and very artificial, they were designed in many different ways, leading to the interoperability problems.        Communication amongst computers (networking, Service Oriented Architectures etc). Fortunately this has evolved over the years and with standards like ASCII, TCP/IP, http, XML, today machines can talk to each other more than ever in history.        Management of the communication between humans to accomplish something (software development methodologies, project management best practices and tools which attempt to bring structure and clarity after the fact, to the raw communication after it happened, either in an iterative or waterfall fashion)        
But the real difficult problem lies in not starting with a clearer, unambiguous communication between humans to begin with. While there are many tools for unstructured communication between humans such as word processors, email, web pages, there is no universally suitable structure for unambiguous and structured communication between humans.
The Art:
Examination of the following prior art, in the fields of software development, web technologies and knowledge representation from many different, almost unrelated, perspectives did not yield a satisfactory, simple solution that solves these problems holistically. A driving question was how would/should knowledge be captured and software be built in the future and which language and architecture will have the capacity to sustain over decades, over a variety of platforms such as mobile phones, computers, and other devices, in spite of the ever changing world. Hence this invention.
JAVA, C# and other object oriented programming languages require somebody skilled in object oriented programming to use it. PHP, PERL, PYTHON, SQL and other programming languages are not meant for non-technical people. They subscribe to a two part paradigm, specification documentation (requirements) AND code, which often get out of sync. Tools like Javadocs create documentation out of programs and the comments in them, but given the level at which they operate they are very technical in nature and neither help business users nor give an end to end view of the entire system (ie, all the tiers beyond the JAVA code).
Tools such as IBM Rational tool set provide for traceability from Requirements to code, but they involve committing to a vendor's tool set with a large footprint and it is still a glue of multiple tools. They do not support an open representation format which can be used to enable a seamless ocean of structured knowledge and programs.
Semantic Web initiative, OWL and RDF: Per the W3C documentation, “OWL is intended to be used when the information contained in documents needs to be processed by applications, as opposed to situations where the content only needs to be presented to humans. OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms.” However the drawback again is that OWL and RDF formats are limited to the “description” kind of knowledge using the subject-predicate-object metaphor and describing process flows using that metaphor can be very cumbersome. They are also technically verbose and not easy to understand for lay persons.
Rules engines are usually tied to a proprietary tool and representation formats that a) cannot be used independent of the tool and reasoning mechanisms b) cannot seamlessly inter operate with other tools c) do not support decentralization of the rules over the web and their aggregation on demand.
Websites such Ehow.com describe How to do anything, but do not have a standard way to describe the inputs and outputs and it also cannot integrate multiple “how to's”, nor interoperate with the definitions of each of the terms used in it.
Wikipedia.org is a free format description, mostly descriptive type of knowledge for humans but is not machine processable.
Natural language based systems: if a simple sentence like “Mary had a little lamb” can mean completely different things in different contexts, and smart humans can misunderstand it, human-made computers cannot understand better. That is why vast efforts over the past few decades into Artificial Intelligence, Natural Language Processing (NLP) and understanding, has not yielded the promised results in making sense out of “free form” unrestricted human communication.
Efforts such as OpenCyc and Mindpixel aimed at creating a universe of knowledge, but their drawback is that they are based on a central storage and require only one authoritative meaning for terms, whereas in the real world the same word or phrase can mean entirely different things in different contexts for different people.
Coming from a different perspective, the industry and BPM vendors are pursuing standards such as IDIF, UML, RADs, PIF, PSL, WPDL, XPDL, XLANG, BPML, BPEL4WS, which are all somewhat overlapping standards. They approach this problem strictly from a work flow automation, business process and choreography perspective but are light on the other aspects of knowledge representation. It is not a holistic solution and “the problem for the systems integrator is that it is not easy to transfer process information between design tools and/or work flow control software” based upon the different design paradigms and it again is not in a near natural language form. Efforts such as wfMC indicate the challenges in integrating the different technologies.
Dublin Core Meta-data initiative is an organization dedicated to promoting the widespread adoption of inter operable meta-data standards and developing specialized metadata vocabularies for describing resources that enable more intelligent information discovery systems.” It is focused strictly on meta data (only about 15 fields) and ensuring adoption of common vocabularies and needs to be complimented and augmented by providing a common viewing mechanism to merge the core meta-data with other domain specific meta-data and process flow information.
Literate Programming requires actual code to be interleaved (mixed) with the documentation. Even though documentation is the main focus, the code and documentation are still separate entities which still does not solve the root cause. The industry has not adopted it as they should have.
The promise of Model Driven Architecture (MDA) is to allow definition of machine readable application and data models which allow long-term flexibility of implementation, integration, maintenance, testing and simulation. These models are not in a near natural language and require training the users.
Codeless platform expects an object model as its input and it is not a programming tool for non-technical users.
DITA (Darwin Information Typing Architecture from IBM) is an XML-based, end-to-end architecture for authoring, producing, and delivering technical information. While DITA is based on a generic building block of a topic-oriented information architecture, it is not for building applications.
VITAL from Apple provides a technical architecture blue print for building enterprise software. It mainly focuses on the Technical Architecture Layer of the Zachman framework which they believe can come before the Business, Systems and Product Architectures. But it does not start with the Business process, nor does it support a universal representation model.
IBMs Flowmark and U.S. Pat. No. 5,930,512 Method and apparatus for building and running work flow process models using a hypertext markup language This invention provides a computer implemented method and system for implementing a workflow process server. The limitation here is that the language is not near natural language, nor does it support democratically generated models.
UBL: UBL, the Universal Business Language, is the product of an international effort to define a royalty-free library of standard electronic XML business documents such as purchase orders and invoices. Its vision aligns with our vision, but it is not a wholistic solution.
Web Service Semantic annotation using WSDL-S provides for extensibility elements (modelReference, Schemamapping, precondition, effect and category) to tie WSDL definitions to ontologies specified in a choice of representations. While this proposes mechanisms to tie WSDLs and Ontologies, it is more of a glue and not a seamless single representation of semantics and services.
Tools such as “InfoPath: An XML Editor for Rich Business Processes” do not solve either of the problems (making software transparent in natural language-or-ability to get two pieces of software work together). i.e they still subscribes to the paradigm of “documentation AND code”
Tools such as Netspective provide for tags to declare more and code less, but they are still in a “high level programming language”. Does not use near natural language, nor it provides for a universal representation format which can be linkable.
MeTaL (www.meta-language.net) is shorthand for Meta-programming Language. Meta-programming is a method to develop computer programs. It works by generating source code in a target language from a program specification in a higher level language. MetaL programs source code is based on XML. This technology still requires the users to understand the high level language for the program specification and it does not support a distributed model of knowledge representation.
U.S. Pat. No. 6,282,547 hyper-linked relational database visualization system—is oriented towards databases, but does not address not address the need to provide a universal knowledge representation format.
U.S. Pat. No. 6,256,618 Computer architecture using self-manipulating trees—is about making sense out of free unconstrained natural language and does not address the need to aggregate decentralized knowledge nor provide a universal knowledge representation format.
U.S. Pat. No. 7,140,000 Knowledge oriented programming: is not solving the problems mentioned above and is not in near natural language.
U.S. Pat. No. 7,013,308 Knowledge storage and retrieval system and method The limitations of this invention are that this is just for knowledge storage and retrieval, but cannot build applications out of it or enable process flows. It is not fully decentralized as well.
US Patent Application #20050086188, “Knowledge Web” proposes a centralized, controlled proprietary storage in a learning scenario and [37] mandates a centralized registry, of that knowledge. It is not a decentralized, open representation format, that could be used independent of the tool.
US Patent Application 20040220969 Methods for the construction and maintenance of a knowledge representation system is more focused on domain specific templates and ontologies. Is not a universal near natural language Knowledge Representation.
Patent Application 20030217023: “Method and apparatus for extracting knowledge from software code or other structured data” This is solving the problem of infering the knowledge from existing software code. The resultant representation is a Knowledgebase. It appears like a one time reverse engineering tool, but does not indicate how it will be maintained going forward This does not solve the “documentation AND code” situation. It creates yet another knowledge base about the software that is being reverse engineered.
“End-User Programming” effort at Carnegie Mellon University is attempting simplifying existing programming languages and other work around techniques such as programming by example etc which is different than directly confronting the problem of creating a representation format for human computer/human-human communication in near-natural languages.
The present invention addresses a number of these needs.