Since the earliest history, various institutions (e.g., governments and private companies alike) have recorded their actions and transactions. Subsequent generations have used these archival records to understand the history of the institution, the national heritage, and the human journey. These records may be essential to support the efficiency of the institution, to protect the rights of individuals and businesses, and/or to ensure that the private company or public corporation/company is accountable to its employees/shareholders and/or that the Government is accountable to its citizens.
With the advance of technology into a dynamic and unpredictable digital era, evidence of the acts and facts of institutions and the government and our national heritage are at risk of being irrecoverably lost. The challenge is pressing—as time moves forward and technologies become obsolete, the risks of loss increase. It will be appreciated that a need has developed in the art to develop an electronic records archives system and method especially, but not only, for the National Archives and Records Administration (NARA) in a system known as Electronic Records Archives (ERA), to resolve this growing problem, in a way that is substantially obsolescence-proof and policy neutral.
Several organizations and governing bodies have tried to solve the issue of digital preservation. The Victorian Electronics Records Strategy (VERS) developed by the Public Record Office of Victoria (PROV) Australia mandated that a single format (e.g. PDF-A) be used as the universal format for preserving digital records. However, this imposes a format limitation that is not practical when the behavior of an object is an essential characteristic.
While embodiments of the invention will be described with respect to its application for safeguarding government records, the described embodiments are not limited to archives systems applications nor to governmental applications and can also be applied to other large scale storage applications, in addition to archives systems, and for businesses, charitable (e.g., non-profit) and other institutions, and entities.
An aspect of the ERA according to the present invention is to preserve and to provide ready access to authentic electronic records of enduring value.
In one aspect, the ERA supports and flows from NARA's mission to ensure “for the Citizen and the Public Servant, for the President and the Congress and the Courts, ready access to essential evidence.” This mission facilitates the exchange of vital ideas and information that sustains the United States of America. NARA is responsible to the American people as the custodian of a diverse and expanding array of evidence of America's culture and heritage, of the actions taken by public servants on behalf of American citizens, and of the rights of American citizens. The core of NARA's mission is that this essential evidence must be identified, preserved, and made available for as long as authentic records are needed—regardless of form.
The creation and use of an unprecedented and increasing volume of Federal electronic records—in a wide variety of formats, using evolving technologies—poses a problem that the ERA must solve. An aspect of the present invention involves an integrated ERA solution supporting NARA's evolving business processes to identify, preserve, and make available authentic, electronic records of enduring value—for as long as they are needed.
In another aspect, the ERA can be used to store, process, and/or disseminate a private institution's records. That is, the ERA may store records pertaining to a private institution or association, and/or the ERA may be used by a first entity to store the records of a second entity. System solutions, no matter how elegant, may be integrated with the institutional culture and organizational processes of the users.
Since 1934, NARA has developed effective and innovative processes to manage the records created or received, maintained or used, and destroyed or preserved in the course of public business transacted throughout the Federal Government. NARA played a role in developing this records lifecycle concept and related business processes to ensure long-term preservation of, and access to, authentic archival records. NARA also has been instrumental in developing the archival concept of an authentic record that consists of four fundamental attributes: content, structure, context, and presentation.
NARA has been managing electronic records of archival value since 1968, longer than almost anyone in the world. Despite this long history, the diverse formats and expanding volume of current electronic records pose new challenges and opportunities for NARA as it seeks to identify records of enduring value, preserve these records as vital evidence of our nation's past, and make these records accessible to citizens and public servants in accordance with statutory requirements.
The ERA should support, and may affect, the institution's (e.g., NARA's) evolving business processes. These business processes mirror the records lifecycle and are embodied in the agency's statutory authority:                Providing guidance to Federal Agencies regarding records creation and records management;        Scheduling records for appropriate disposition;        Storing and preserving records of enduring value; and/or        Making records available in accordance with statutory and regulatory provisions.        
Within this lifecycle framework, the ERA solution provides an integrated and automated capability to manage electronic records from: the identification and capture of records of enduring value; through the storage, preservation, and description of the records; to access control and retrieval functions.
Developing the ERA involves far more than just warehousing data. For example, the archival mission is to identify, preserve, and make available records of enduring value, regardless of form. This three-part archival mission is the core of the Open Archival Information System (OAIS) Reference Model, expressed as ingest, archival storage, and access. Thus, one ERA solution is built around the generic OAIS Reference Model (presented in FIG. 1), which supports these core archival functions through data management, administration, and preservation planning.
The ERA may coordinate with the front-end activities of the creation, use, and maintenance of electronic records by Federal officials. This may be accomplished through the implementation of disposition agreements for electronic records and the development of templates or schemas that define the content, context, structure, and presentation of electronic records along with lifecycle data referring to these records.
The ERA solution may complement NARA's other activities and priorities, e.g., by improving the interaction between NARA staff and their customers (in the areas of scheduling, transfer, accessioning, verification, preservation, review and redaction, and/or ultimately the ease of finding and retrieving electronic records).
Like NARA itself, the scope of ERA includes the management of electronic and non-electronic records, permanent and temporary records, and records transferred from Federal entities as well as those donated by individuals or organizations outside of the government. Each type of record is described and/or defined below.
ERA and Non-Electronic Records: Although the focus of ERA is on preserving and providing access to authentic electronic records of enduring value, the system's scope also includes, for example, management of specific lifecycle activities for non-electronic records. ERA will support a set of lifecycle management processes (such as those used for NARA) for appraisal, scheduling, disposition, transfer, accessioning, and description of both electronic and non-electronic records. A common systems approach to appraisal and scheduling through ERA will improve the efficiency of such tasks for non-electronic records and help ensure that permanent electronic records are identified as early as possible within the records lifecycle. This same common approach will automate aspects of the disposition, transfer, accessioning, and description processes for all types of records that will result in significant workflow efficiencies. Archivists, researchers, and other users may realize benefits by having descriptions of both electronic and non-electronic records available together in a powerful, universal catalog of holdings. In an embodiment, some of ERA's capabilities regarding non-electronic records may come from subsuming the functionality of legacy systems such the Archival Research Catalog (ARC). To effectively manage lifecycle data for all types of records, in certain embodiments, ERA also may maintain data interchange (but not subsume) other legacy systems and likely future systems related to non-electronic records.
Permanent and Temporary Records: There is a fundamental archival distinction between records of enduring historic value, such as those that NARA must retain forever (e.g., permanent records) and those records that a government must retain for a finite period of time to conduct ongoing business, meet statutory and regulatory requirements, or protect rights and interests (e.g., temporary records).
For a particular record series from the U.S. Federal Government, NARA identifies these distinctions during the record appraisal and scheduling processes and they are reflected in NARA-approved disposition agreements and instructions. Specific records are actually categorized as permanent or temporary during the disposition and accessioning processes. NARA takes physical custody of all permanent records and some temporary records, in accordance with approved disposition agreements and instructions. While all temporary records are eventually destroyed, NARA ultimately acquires legal (in addition to physical) custody over all permanent records.
ERA may address the distinction between permanent and temporary records at various stages of the records life-cycle. ERA may facilitate an organization's records appraisal and scheduling processes where archivists and transferring entities may use the system to clearly identify records as either permanent or temporary in connection with the development and approval of disposition agreements and instructions. The ERA may use this disposition information in association with the templates to recognize the distinctions between permanent and temporary records upon ingest and manage these records within the system accordingly.
For permanent records this may involve transformation to persistent formats or use of enhanced preservation techniques to insure their preservation and accessibility forever. For temporary records, NARA's Records Center Program (RCP) is exploring offering its customers an ERA service to ingest and store long-term temporary records in persistent formats. To the degree that the RCP opts to facilitate their customers' access to the ERA for appropriate preservation of long-term temporary electronic records, this same coordination relationship with transferring entities through the RCP will allow NARA to effectively capture permanent electronic records earlier in the records lifecycle. In the end, ERA may also provide for the ultimate destruction of temporary electronic records.
ERA and Donated Materials: In addition to federal records, NARA also receives and accesses donated archival materials. Such donated collections comprise a significant percentage of NARA's Presidential Library holdings, for example. ERA may manage donated electronic records in accordance with deeds of gift of deposit agreements which, when associated with templates, may ensure that these records are properly preserved and made available to users. Although donated materials may involve unusual disposition instructions or access restrictions, ERA should be flexible enough to adapt to these requirements. Since individuals or institutions donating materials to NARA are likely to be less familiar with ERA than federal transferring entities, the system may also include guidance and tools to help donors and the NARA appraisal staff working with them insure proper ingest, preservation, and/or dissemination of donated materials.
Systems are designed to facilitate the work of users, and not the other way around. One or more of the following illustrative classes of users may interact with the ERA: transferring entity; appraiser; records processor; preserver; access reviewer; consumer; administrative user; and/or a manager. The ERA may take into account data security, business process re-engineering, and/or systems development and integration. The ERA solution also may provide easy access to the tools the users need to process and use electronic records holdings efficiently.
NARA must meet challenges relating to archival of massive amounts of information, or the American people risk losing essential evidence that is only available in the form of electronic federal records. But beyond mitigating substantial risks, the ERA affords such opportunities as:                Using digital communication tools, such as the Internet, to make electronic records holdings, such as NARA's, available beyond the research room walls in offices, schools, and homes throughout the country and around the world;        Allowing users to take advantage of the information-processing efficiencies and capabilities afforded by electronic records;        Increasing the return on the public's investment by demonstrating technological solutions to electronic records problems that will be applied throughout our digital society in a wide variety of institutional settings; and/or        Developing tools for archivists to perform their functions more efficiently.        
The challenges faced by NARA are typical of broader archival problems and reveal drawbacks associated with known solutions. Thus, in an embodiment, an ERA may be provided to address some or all of the more general problems. In particular, archives systems exist for storing and preserving electronic assets, which are stored as digital data. Typically, these assets are preserved for a period of time (retention time) and then deleted. These systems maintain metadata about the assets in asset catalogs to facilitate asset management. Such metadata may include one or more of the following:                Attributes to uniquely identify assets;        Attributes to describe assets;        Attributes to facilitate search through the archives;        Attributes to define asset structure and relationships to other assets;        Attributes to organize assets;        Attributes for asset protection;        Attributes to maintain information about asset authenticity; and/or        Status of the asset lifecycle (e.g., planning receipt of asset through eventual deletion).        
Unfortunately, these systems all suffer from several drawbacks. For example, there are limitations relating to the scale of the assets managed and, in particular, the size and number of all the assets maintained. These systems also have practical limitations in the duration in which they retain assets. Typically, archives systems are designed to retain data for years or sometimes decades, but not longer. As retention times of assets become very long or indefinite, longevity of the archives system itself, as well as the assets archived, is needed because an archives system's basic requirement is to preserve assets.
Indefinite longevity of an archives system and its assets pose challenges. For example, providing access to old electronic assets is complicated by obsolescence of the asset's format. Regular upgrades of the archives system itself, including migrations of asset data and/or metadata to new storage systems is complicated by extreme size of the assets managed, e.g., if the metadata has to be redesigned to handle new required attributes or to handle an order of magnitude greater number of assets than supported by the old design, then the old metadata generally will have to be migrated to the new design, which could entail a great deal of migration. Extreme scale and longevity make impractical archives systems that are not designed to accommodate unknown, future changes and reduce the impact of necessary change as much as possible.
Archives systems today are built on top of underlying storage systems based on commercial products that are typically comprised of file systems (e.g., Sun's ZFS file system) or relational databases (e.g., Oracle), and sometimes proprietary systems (e.g., EMC Centera). All of these storage systems have limitations in terms of scale (though sometimes the limits can be quite high). In some cases, there may be no products that can make use of the full scale of available file systems. Few of these systems can scale to trillions of entries (e.g., files). Limitations arise for different reasons but can be related to one or more of the following factors, alone or in combination:                Limitations of object or file identification schemes (e.g., uniqueness of identifiers. www.doi.org provides background on the state of the art for electronic/digital entity identifiers);        Catalog limitations (e.g., number of entries, design bottlenecks);        The number of storage subsystems that can be integrated (sometimes termed horizontal scalability);        The capacity of underlying storage technologies;        Search and retrieval performance considerations (e.g., search can become impractical with extreme size);        The ability to distribute system components (e.g., systems can be difficult to distribute geographically); and/or        Limitations of system maintenance tasks that are a function of system size (e.g., systems can become impractical to administer with extreme size).        
Currently, relational databases (DBs) can scale only to 10 billion objects per instance. Relational DBs also generally do not perform as well as file systems for simple search and retrieval function tasks because they tend to introduce additional overhead to meet other requirements such as fine-grained transactional integrity. There is also no viable product that integrates multiple file systems in a way that provides both extreme scaling and longevity suitable for an archives file system.