Dynamic analysis of documents has become an important part of many computer programs. Word processors perform spell checking, autocorrect spelling mistakes and analyze the grammar of sentence as a user types a document. Software editors dynamically color-code program text as a programmer types. Interactive development environments (IDEs), and the parsers that underlie them, go a step further to perform lexical analysis on code and statement completion as a programmer types. These types of programs, which we will generically refer to as editors, all share the need to incrementally execute logic over a stable snapshot of a document as it is being modified.
Interactive editors share the characteristic that there is real-time input or modifications to a document taking place. Editors use separate processes to provide additional features in a responsive fashion. Although the remainder of this disclosure provides examples in terms of “processes”, the present disclosure is not limited to or dependent upon any particular unit of execution. Thus, the term “process” can mean a unit of any granularity of execution, including but not limited to a running program, function, thread, processor level thread, a remote procedure call to another machine, or other computing operation.
One or more processes are responsible for receiving user input and displaying it in the editor, and building the document. Additional background processes execute other features like auto-correction, statement completion, and color coding. This provides the user with a more responsive user interface experience, but it increases the complexity of the editor's implementation because the data entered by the user must be read and analyzed simultaneously by a number of different processes.
When data is accessed by multiple processes, synchronization is normally used to ensure that data written by one process isn't inadvertently overwritten by another process. In addition, synchronization ensures that processes analyzing the document have a stable, unchanging version to work from. This allows one process to obtain exclusive access to the data in order to make changes or complete an analysis. In this manner, only a single process may be modifying the resource at a given time and only when no readers have the document locked for analysis.
Synchronization requires each process to participate in a scheme where the process may obtain a lock on the data. While locks guarantee that a process has exclusive access to the data, locks can also result in poor performance. Processes may lock data for a long period of time, forcing other processes to wait to access the data and slowing the performance of the system. This is particularly problematic when one or more of the processes waiting to access the data is responsible for updating the display and accepting user input since the user's terminal effectively becomes inoperable during lengthy or frequent locking periods. Deadlocks can also occur in which two or more processes wait to access data locked by another process. This also can adversely impact performance and response time since deadlocks usually require a timeout before the process decides to release its locks and try again.
Synchronization also increases code complexity. The code must be carefully written to avoid holding locks too long and starving other processes from executing. Poor synchronization can also introduce additional bugs, the nature of which may be only detectable at runtime. Synchronization bugs are also notoriously difficult to reproduce, resulting in end users experiencing the adverse effects of a synchronization problem.
In addition, it is often desirable for an application to hold multiple versions of the underlying document. For example, the application may need to compare the current version of the document with a previous version or implement an undo stack for reversing changes to a document. But keeping previous versions of a document is resource intensive and presents the problem of knowing when processes are no longer interested in the certain versions of the document and when those versions can be discarded.
In theory, it is possible to address the issues outlined in this section by simply copying all or part of the document whenever a process requires access. However, there are two downsides to this approach which make it undesirable. First, it can require a lot of memory to maintain the separate copies. And second, it can take a lot of time to copy the data e.g. when new versions are required. The deficiencies of this approach get worse as the size of the data increases. An editor's response time decreases and the developer's user experience becomes less desirable.