1. Field of the Invention
The present invention relates generally to data processing systems and, more particularly, to a computer markup language for use in a data browser and manipulator.
2. Related Art
Currently on the Internet, transmissions and communications are commonly conducted using a communication protocol called the HyperText Transfer Protocol (“HTTP”) which can be used to pass files and documents formatted in the HyperText Markup Language (“HTML”). A markup language is a way of embedding markup “tags,” special sequences of characters, that describe the structure as well as the behavior of a document and instruct a web browser or other program on how to display the document. Typically, documents or web pages formatted in HTML are simply ASCII text files that mix ordinary text with these markup tags.
HTML has a relatively limited structure that defines a fixed set of tags with specific purposes. Further, HTML typically only works with text and images and typically only instructs a browser on how to display a document: the browser may read and display characters but does not “understand” the data content. To the extent that HTML browsers present numbers in their display, they still are not interpreted as numbers—just text. Hence, HTML documents are not interpreted as “data” but rather as formatting instructions for displaying images. Users cannot “surf” through numerical data, to see graphs, apply transformations, combine numbers from different web pages, or load numbers into a spreadsheet in a manageable form. The numbers cannot be directly read by an analytical program without human intervention to cut-and-paste the text, determine the data type, etc. Consequently, conventional analytical programs allow for ad hoc review and manipulation of abstract numbers (e.g., a spreadsheet program or database program), but do not directly read their data from online sources. Such programs may perform statistical analysis, structural analysis and simple transformations on data once it has been entered and interpreted.
Given HTML's limited capabilities, and SGML's unwieldy complexity, a markup language called Extensible Markup Language (“XML”) was developed to help overcome some of these limitations. XML is a free-form markup language with unspecified tags, which allows developers to develop their own tags and, in effect, create their own markup languages geared toward specialized tasks. In XML, the tags must be organized according to certain rules, but their meaning is flexible. Unlike HTML, XML describes structure and meaning, but not formatting. As such, different professions may develop their own specialized markup languages. For example, if a developer were to create a markup language that describes books in XML, the developer could create specifically meaningful tags for “title,” “author,” and “publisher,” something not possible in HTML. Although XML's free-form structure permits the development of markup languages, such individualized markup languages are not compatible with each other because the use of the tags is not standardized in that different users use the tags for different purposes.
In today's business world, problems that typically accompany data manipulation often increase expense and difficulty. One such problem is that often data and the documentation that describes the data are not both in electronic form. This conventional approach to database and spreadsheet information often dictates that expensive database administrators are required to make transformations anytime data is being transferred from one system to another, expensive analysis of printed documentation is required in connection with any programming tasks, and the output rarely contains any indication of the original sources, structures, and manipulations that created that output. In PC-based systems, creating documentation for data is conventionally left up to the user: typically there is no machine-driven effort to collect the documentation from the user, format it, and save it with the data, thereby eliminating the ease of reuse of the data.
Another obstacle impeding efficiency in conventional databases and spreadsheets is that calculations occur at too low of a conceptual level. Calculations in typical numerical analysis programs operate on a single “cell” in a spreadsheet or a single “record” in a database. Analytic operations on single values at a time can be slow and prove costly when many different cells or record values are involved.
The lack of a standard markup language facilitating the browsing of numbers leaves no way to read, automatically manipulate and display differing types of numerical data read from multiple online sources on a single chart. Human intervention is required to recognize differing types of numerical data and conform the data so that it may be combined and displayed coherently on charts, graphs and reports. Conventionally, formatting of graphical charts displaying numerical data requires manual manipulation when series of different types of data are combined. Furthermore, no visual cue is given regarding the relationship between different numerical data sets.
The computer industry is further hindered by the fact that data and analytic routines are not standardized. While the computer industry has developed standards for file formats and function-level interfaces, it has not developed a general data format or content-analysis standards. This results in expensive translation of data between systems, industries, companies and users using different protocols.
Analysis routines in conventional spreadsheets typically take the form of “spreadsheet macros.” Macros are essentially short programs which perform well-defined, generally limited, tasks. Millions of spreadsheet users have used spreadsheet macros to automate mechanical tasks involved in manipulating the numbers in their spreadsheets. But the great investment in spreadsheet macros has generally been underutilized because such macros are “write once, use once” types of software; they are rarely reused by others.
There are at least eight reasons that current programming languages and spreadsheet macros are not reusable or portable. One such problem is that spreadsheet data references usually are based on physical locations. Suppose a macro writer puts an interest rate assumption in cell “C4,” and another person has a spreadsheet with the interest rate assumption in cell “BR47,” a macro that expressly references the absolute cell location C4 will not be usable in the second spreadsheet.
Another related problem is that numbers in spreadsheets have no measurement or semantic designators describing their meaning. One spreadsheet may work with dollars in millions, while another works with dollars in thousands. The same macro cannot be used on both spreadsheets without human intervention to sort out all the inconsistencies and to modify one of the spreadsheets to match the other. As another example, a macro may be written to divide stock price by earnings to get a P/E ratio, but numbers in a spreadsheet have no meaning besides words in the cell to the left or above the numbers. Absent a standard location and vocabulary, those indicators are useless.
An additional problem with conventional spreadsheet macros is the lack of documentation. Because macros are typically only usable by their creators on the single spreadsheet they wrote them for, they tend to be totally undocumented: no common-language description, no help files, no data standards as to permissible values, source contact list, license information, etc.
Furthermore, there is no mass distribution mechanism for macros. Spreadsheet macros are not web-friendly: they are generally limited to one spreadsheet brand and one platform, do not support hyperlinks, and cannot be searched by search engines. Also, they are not supported by directory or classification system, and have no ready market.
Even further, users typically do not include unit testing, validity testing, error handling, and other end-user protections on the macros that they write. The result is that users may be wary of the output of macros that they might try to add to their spreadsheets.
Conventional spreadsheet macros have difficulty making graphical interfaces to the data. End users of a foreign macro do not want to have to understand every cell and location constraint, every limitation on valid values that can be input and so forth. The lack of related graphical components further fuels this problem.
Finally, conventional spreadsheet macros are either too small to be worth a marketing effort, or too difficult to use to find a large audience. This results in a lack of a business incentive to make them. It is therefore desirable to overcome the aforementioned problems and other related problems.