The invention generally relates to computers and computer data and metadata of computer data. In particular, the invention provides a method and system for encoding data and metadata, for example, to support run-time checking in programming languages and to serialize computer data.
The information age has been made possible by computers and the software that executes on the computers. There are many types of software and software is used in many industries including business, commerce, education, law, entertainment, medicine, finance, mathematics, energy, and many others. To write software, programmers use programming languages to write their code and compilers to compile their code (or interpreters to execute their code). Some examples of programming languages include C, C++, C#, Java, ML, Ocaml, Haskell, JavaScript, Cobol, Fortran, Prolog, Pascal, Ada, Fourth, Basic, Perl, Ruby, Python, and many others. Despite the widespread success of programming languages and their compilers or interpreters, there are vulnerabilities in programming languages and their compiled code that lead to potential security flaws and others problems.
Programs written in type unsafe programming languages such as C, C++, and assembly languages do not keep the metadata (including but not limited to type information) of their data in their run-time environments. Without metadata, run-time checking may not be performed to verify whether the use of data is actually consistent with their metadata or not.
The lack of run-time checking in type unsafe programming languages such as C, C++, and assembly languages may compromise the security, integrity, and reliability of the computer systems that run the programs written in these programming languages. For example, in type unsafe programming languages, a buffer may be written or read with more data than the buffer can hold due to these languages do not check whether the write or read of the buffer is within the boundary of the buffer or not at run-time. This phenomenal is normally called buffer overrun or buffer overflow. Hackers can intentionally cause buffer overrun to break into systems, inject malicious programs, and/or obtain super user privileges from user-level accounts.
When a computer system runs a program that can overrun its buffers, the computer system is vulnerable to buffer overrun attacks. Since 1988 when the Morris internet worm attacked computer systems by taking advantage of buffer overrun vulnerabilities in a computer program—fingered (written in C)—buffer overrun based attacks have become prevalent threats to the security, integrity, reliability of computer systems worldwide. On average, every third computer security attack is based on exploiting buffer overrun vulnerabilities (statistics from 2000 to 2007). The Blaster worm remediation cost an average of $475,000 per company and with larger companies reporting up to $4.2 million. The conficker worm is estimated to have breached 16 million computers as of February 2009.
In addition to the buffer overrun vulnerability, there are many other vulnerabilities in C/C++/Assembly Languages. With existing vulnerabilities unchecked and new vulnerabilities being introduced in the future, and without viable solutions to prevent future program vulnerabilities from being exploited, the number and economic cost of attacks to computer systems will continue to increase.
Therefore, there is a need for a method and system of encoding data objects and their metadata to support run-time checking of computer code. Run-time checking can either terminate the execution of computer code when errors occur or produce information about the execution of computer code.