The subject application relates to software development tools and electronic document conversion. While the systems and methods described herein relate to document conversion and the like, it will be appreciated that the described techniques may find application in other software development systems, other document conversion applications, and/or other document and software editing systems.
Generic tools for software development, such as workflow or program editors, Makefiles, another neat tool (ANT), Maven, test units, etc., provide a de facto methodology for the iterative definition of a conversion process. This development methodology consists of loops over three phases (where the third is in practice often underexploited): edition of the process definition (e.g., using a chosen syntax such as ANT, JAVA program, scripting, Makefile, etc., from a functional point of view and define steps of the conversion); test run of the process (e.g., applying the definition through a player to a given set of documents and looking at test results); and validation of the process definition.
Most of the time, the validation relies on a “gold reference” based on user annotations. By comparing the gold reference with the application results, one gets a minimal warranty that at least some reference cases are correctly processed. However, this warranty cannot fully ensure correctness. There are a number of issues and limitations with such a validation methodology, in the particular case of document transformation. For instance, building a gold reference sample is not necessarily feasible until the application has actually been built and stabilized. The gold reference sample is not necessarily representative of the entire collection to process. Moreover, the constraints implied by the gold reference sample do not necessarily reflect the intended target of the application under construction. The cost of building and/or updating a gold reference can be prohibitive as the transformation chain under construction evolves.
In addition, in the particular case of document transformation, this 3-phase methodology requires the application designer to go back and forth between editions that operate at the level of XML nodes, tests that run at collection level, and validation diagnoses that refer to documents. Managing consistently and iteratively those different levels can be tedious, misleading and time consuming.
A general problem in this area is editing workflow, in the context of process implementation through workflow edition. For example, a programmer of a script may not possess the necessary broad knowledge of the workflow definition. The programmer may be eager to code a quick fix, which is not conducive to quality or long-term efficiency. Often, the programmer assumes control over data and assets, which can result in errors and limitations due to a lack of checks and balances and/or peer review. Additionally, a programmer's view may lack sufficient abstraction to create reusable tools. Moreover, scripts often lack deep integration; thus, they lack choreography with other business processes.
Attempts to solve these problems have included characterizing services with respect to their contribution to the target and their potential relationships, so as to guarantee their tight compatibility and their relevance to the application. However, this approach does not give insight on the quality of the conversion chain for a specific document collection.
Alternatively, attempts have been made at providing a knowledge-based approach for service selection or advice with respect to domain specific rules, such as published services in a business process execution language (BPEL) environment where services are associated with semantic descriptions conformant to a shared ontology. Again, this approach operates at the service level and does not consider performance over document collections.
These two alternative approaches may be useful at building time, providing means to statically compile chains upon typed services. Nevertheless, they are not associated with a verification system at runtime, and thus do not provide validation-based assistance at building time or validation and supervision at production time, with a common specification effort. Accordingly, there is an unmet need for systems and/or methods that facilitate overcoming the aforementioned deficiencies.