This is the accessible text file for GAO report number GAO-02-586 entitled 'Information Management: Challenges in Managing and Preserving Electronic Records' which was released on June 17, 2002. This text file was formatted by the U.S. General Accounting Office (GAO) to be accessible to users with visual impairments, as part of a longer term project to improve GAO products’ accessibility. Every attempt has been made to maintain the structural and data integrity of the original printed product. Accessibility features, such as text descriptions of tables, consecutively numbered footnotes placed at the end of the file, and the text of agency comment letters, are provided but may not exactly duplicate the presentation or format of the printed version. The portable document format (PDF) file is an exact electronic replica of the printed version. We welcome your feedback. Please E-mail your comments regarding the contents or accessibility features of this document to Webmaster@gao.gov. Highlights: Report to Congressional Requesters: June 2002: Information Management: Challenges in Managing and Preserving Electronic Records: GAO-02-586: June 2002: Information Management: Challenges in Managing and Preserving Electronic Records: Highlights of GAO-02-586, a report to Congressional Requesters: Why GAO Did This Study: In the wake of the transition from paper-based to electronic processes, federal agencies are producing vast and rapidly growing volumes of electronic records. The difficulties of managing, preserving, and providing access to these records represent challenges for the National Archives and Records Administration (NARA) as the nation’s recordkeeper and archivist. GAO was requested to (1) determine the status and adequacy of NARA’s response to these challenges and (2) review NARA’s efforts to acquire an advanced electronic records archiving system, which will be based on new technologies that are still the subject of research. What GAO Found: NARA has taken action to respond to the challenges associated with managing and preserving electronic records. In 2001, NARA completed an assessment of the current federal recordkeeping environment. This study concluded that although agencies are creating and maintaining records appropriately, most electronic records (including databases of major federal information systems) remain unscheduled (that is, their value has not been assessed nor their disposition determined), and records of historical value are not being identified and provided to NARA for archiving. As a result, valuable electronic records may be at risk of loss. Part of the problem is that records management guidance is inadequate in the current technological environment of decentralized systems producing large volumes of complex records. Another factor is the low priority often given to records management programs and the lack of technology tools to manage electronic records. Finally, NARA does not perform systemic inspections of agency records management, and so it does not have comprehensive information on implementation issues and areas where guidance needs strengthening. Although NARA plans to improve its guidance and address technology issues, its plans do not address the low priority generally given to records management programs nor the inspection issue. Recognizing the limitations of its technical strategies to support preservation, management, and sustained access to electronic records, NARA is planning to design, acquire, and manage an advanced electronic records archive; however, this project faces substantial risks. Although the electronic records archive project is in its initial stages, it is already falling behind schedule. Further, to acquire a major system of this kind, NARA needs to improve its information technology (IT) management capabilities, and although it has made progress in doing so, its efforts are not yet complete. What GAO Recommends: GAO recommends that the Archivist of the United States develop documented strategies to raise awareness of the importance of records management programs and for conducting systematic inspections of these programs. In addition, to reduce risks, GAO recommends that the Archivist reassess the schedule for acquiring the new archival system so that the agency can complete key planning tasks and address IT management weaknesses. In commenting on a draft of this report, the Archivist agreed with our recommendations and offered clarifications, which we have incorporated as appropriate. Figure: Master Copies of Electronic Records in NARA’s Archives: [See PDF for image] Source: NARA. [End of figure] This is a test for developing highlights for a GAO report. The full report, including GAO’s objectives, scope, methodology, and analysis is available at www.gao.gov/cgi-bin/getrpt?GAO-02-586. For additional information about the report, contact Linda Koontz, 202-512-6240. To provide comments on this test highlights, contact Keith Fultz (202-512- 3200) or email HighlightsTest@gao.gov. Contents: Letter: Results in Brief: Background: NARA Is Responding to Challenges of Electronic Records Management: NARA’s Effort to Acquire Advanced Electronic Archival System Faces Risks: Conclusions: Recommendations for Executive Action: Agency Comments and Our Evaluation: Appendixes: Appendix I: Objectives, Scopes, and Methodology: Appendix II: Approaches to Archiving Electronic Records Provide Partial Solutions: Appendix III: NARA’s Electronic Records Guidance Has Evolved: Appendix IV: Agencies Are Managing Large Volumes of Important Elecrtonic Records: Appendix V: Comments from the National Archives and Records Administration: Glossary: Table: Table 1: Timeline for ERA Program: Figures: Figure 1: Removable Hard Drives and Backup Devices Used by Independent Counsel Staff: Figure 2: Master Copies of Electronic Records in NARA’s Archives: Figure 3: OAIS Model and Its Components: Figure 4: Sample of XML Version of State Department Telegram: Figure 5: The Long Now Foundation Rosetta Disk Language Archive: Figure 6: Internet Archive Collection of Presidential Candidate Web Sites: Figure 7: Google’s Usenet Archive: Abbreviations: ASCII: American Standard Code for Information Interchange: DARPA: Defense Advanced Research Projects Agency: DOD: Department of Defense: EAST: Examiners Automated Search Tool: ERA: Electronic Records Archive: GAO: General Accounting Office: GIS: Geographic Information System: GRS: General Records Schedule: GSA: General Services Administration: HTML: Hypertext Markup Language: HUD: Housing and Urban Development: IG: Inspector General: IT: information technology: NARA: National Archives and Records Administration: NASA: National Aeronautics and Space Administration: OAIS: Open Archival Information System: OMB: Office of Management and Budget: PMO: program management office: POP: persistent object preservation: PTO: U.S. Patent and Trademark Office: SAS: State Archiving System: SF: standard form: VERS: Victorian Electronic Record Strategy: WEST: Web Examiner Search Tool: XML: Extensible Markup Language: Letter June 17, 2001: The Honorable Stephen Horn Chairman, Subcommittee on Government Efficiency, Financial Management and Intergovernmental Relations Committee on Government Reform House of Representatives: The Honorable Ernest J. Istook, Jr. Chairman, Subcommittee on Treasury, Postal Service and General Government Committee on Appropriations House of Representatives: Agencies are increasingly moving to an operational environment in which electronic--rather than paper--records provide comprehensive documentation of their activities and business processes. Although this transformation has improved the way federal agencies work and interact with each other and with the public, it has also created the new challenge of managing and preserving vast and rapidly growing volumes of electronic records. Because these records document essential government functions and provide information necessary to protect government and citizen interests, their proper management is essential for ongoing government activities; further, the preservation of significant documents and other records is crucial for the historical record. Overall responsibility for the government’s electronic records lies with the National Archives and Records Administration (NARA), which carries out a dual mission for the nation: oversight of records management, which governs the life cycle of records (creation, maintenance and use, and disposition), and archiving, which is the permanent preservation of documents and other records of historical interest. In carrying out these missions, NARA and agencies use a process known as scheduling to assess the value of records and determine their disposition. The challenges associated with managing and preserving electronic records have long been recognized throughout government. Because of concern about these issues, you requested that we review electronic records management and preservation activities at NARA. Our objectives were to: * determine the status of NARA’s efforts to respond to governmentwide electronic records management problems and the adequacy of its planned actions and: * assess NARA’s efforts to acquire an archival system for electronic records. As part of our assessment of NARA’s efforts to acquire an electronic records archiving system, you also asked that we identify alternative technologies under consideration for the long-term preservation of electronic records. To address our objectives, we reviewed applicable guidance and other documentation; surveyed NARA’s appraisal archivists working with federal agencies; reviewed records management activities and obtained the views of record managers in selected federal agencies managing large volumes of electronic records; and reviewed legal challenges to federal electronic recordkeeping practices. We reviewed agency and contractors’ documentation for the electronic records archive program and assessed NARA’s effort to develop or enhance its information technology capabilities. Further details on our objectives, scope, and methodology are provided in appendix I. Results in Brief: NARA has taken action to respond to the challenges associated with managing and preserving electronic records. In 2001, NARA completed an assessment of the current federal recordkeeping environment; this study concluded that although agencies are creating and maintaining records appropriately, most electronic records (including databases of major federal information systems) remain unscheduled, and records of historical value are not being identified and provided to NARA for preservation in archives. As a result, valuable electronic records may be at risk of loss. Part of the problem is that records management guidance is inadequate in the current technological environment of decentralized systems producing large volumes of complex records. Another factor is the low priority often given to records management programs and the lack of technology tools to manage electronic records. Finally, NARA does not perform systematic inspections of agency records and records management programs, and so it does not have comprehensive information allowing it to identify records management implementation issues and areas where its guidance needs to be strengthened. NARA plans to improve its guidance and to address technology issues. However, NARA’s plans do not address the low priority generally given to records management programs nor the issue of systematic inspections. Recognizing the limitations of its technical strategies to support preservation, management, and sustained access to electronic records, NARA is planning to design, acquire, and manage an advanced electronic records archive (ERA); however, this project faces substantial risks. NARA is behind schedule for the ERA system, largely because of flaws in how the schedule was developed. Further, to acquire a major system like ERA, NARA needs to improve its information technology (IT) management capabilities, and although it has made progress in doing so, its efforts are not yet complete. Regarding alternative archiving technologies for electronic records, we found that archival organizations now rely on a mixture of evolving approaches that generally fall short of solving the long-term preservation problem. Appendix II provides a detailed discussion of these approaches. In light of the continuing challenge of managing federal records, both electronic and otherwise, we are recommending that the Archivist of the United States develop a strategy for raising awareness of the importance of federal records management programs and for performing systematic inspections. In addition, to mitigate the risks associated with developing the new archival system, we are recommending that the Archivist reassess the schedule for this effort. In commenting on a draft of this report, the Archivist stated that more must be done to address the enormous challenges in managing and preserving electronic records and agreed with the report’s recommendations. He also offered clarifications concerning records management priority, inspections, and the ERA schedule that we have incorporated as appropriate. Background: Advances in information technology and the explosion in computer interconnectivity brought about by the Internet are irreversibly changing the way we communicate and conduct business. Office automation applications and networked desktop computers are providing the capability to rapidly create and share electronic documents, use Web sites for executing business and financial transactions, and instantaneously communicate with individuals and groups. While the transformation from a paper-based to an electronic business environment has led to improvements in the way federal agencies do business, both with each other and with the public, it has also created the new challenge of managing and preserving electronic records, which must be approached differently from their paper counterparts. Unlike paper records, electronic records are not tangible, come in many formats, and depend on the hardware and software with which they were created. NARA’s mission is to ensure “ready access to essential evidence” for the public, the President, the Congress, and the Courts. NARA’s responsibilities stem from the Federal Records Act,[Footnote 1] which requires each federal agency to make and preserve records that (1) document the organization, functions, policies, decisions, procedures, and essential transactions of the agency and (2) provide the information necessary to protect the legal and financial rights of the government and of persons directly affected by the agency’s activities. Effective management of these records is critical for ensuring that sufficient documentation is created; that agencies can efficiently locate and retrieve records needed in the daily performance of their missions; and that records of historical significance are identified, preserved, and made available to the public. According to NARA, without effective records management, the records needed to document citizens’ rights, actions for which federal officials are responsible, and the historical experience of the nation will be at risk of loss, deterioration, or destruction. Under the act, NARA is responsible for oversight of records management and archiving. Records management--that is, the policies, procedures, guidance, tools and techniques, resources, and training needed to design and maintain reliable and trustworthy records systems--governs the life cycle of records from creation, through maintenance and use, to final disposition. Archiving is the permanent preservation of records documenting the activities of the government. NARA thus oversees agency management of temporary records used in everyday operations and ultimately takes control of permanent agency records judged to be of historic value.[Footnote 2] Of the total number of federal records, less than 3 percent are designated permanent. NARA Is Responsible for Oversight of Records Management: NARA is responsible for issuing records management guidance; working with agencies to implement effective controls over the creation, maintenance, and use of records in the conduct of agency business; providing oversight of agencies’ records management programs; and providing storage facilities for certain temporary agency records. The Federal Records Act also authorizes NARA to conduct inspections of agency records and records management programs. NARA works with agencies to identify and inventory records, appraise their value, and determine whether they are temporary or permanent, how long the temporary records should be kept, and under what conditions both the temporary and permanent records should be kept. This process is called scheduling. No record may be destroyed unless it has been scheduled, and for temporary records the schedule is of critical importance because it provides the authority to dispose of the record after a specified time period. Records are governed by schedules that are specific to an agency or by a general records schedule, which covers records common to several or all agencies. According to NARA, records covered by general records schedules make up about a third of all federal records. For the other two thirds, NARA and the agencies must agree upon specific records schedules. Once a schedule has been approved, the agency must issue it as a management directive, train employees in its use, apply its provisions to temporary and permanent records, and evaluate the results. While the Federal Records Act covers documentary material regardless of physical form or media, records management and archiving were until recently largely focused on handling paper documents. With the advent of computers, both records management and archiving have had to take into account the creation of records in varieties of electronic formats. NARA’s basic guidance for the management of electronic records is in the form of a regulation at 36 CFR Part 1234. This guidance is supplemented by the issuance of periodic NARA bulletins and a records management handbook, Disposition of Federal Records. NARA’s guidance has two basic requirements. First, agencies are required to maintain an inventory of all agency information systems. The inventory should identify (1) the system’s name; (2) its purpose; (3) the agency programs supported by the system; (4) data inputs, sources, and outputs; (5) the information content of databases; and (6) the system’s hardware and software environment. Second, NARA requires agencies to schedule the electronic records maintained in its systems. Agencies must either schedule those records under specific schedules, completed through submission and approval of Standard Form 115 (SF 115), Request for Records Disposition Authority, or pursuant to a general records schedule. NARA relies on this combination of inventory and scheduling requirements to ensure the management of agency electronic records consistent with the Federal Records Act. NARA has also established a general records schedule for electronic records. General Records Schedule 20 (GRS 20) authorizes the disposal of certain categories of temporary electronic records. It has been revised several times over the years in response to developments in information technology, as well as legal challenges. (App. III provides a discussion of the evolution of electronic records guidance and legal challenges to GRS 20.): As it stands now, GRS 20 applies to electronic records created both in computer centers engaged in large-scale data processing and in the office automation environment. With regard to computer centers, GRS 20 authorizes the disposal of certain types of scheduled electronic records associated with large database systems, such as inputs, outputs, and processing files. With regard to the office desktop environment, GRS 20 authorizes the deletion of the electronic version of records on word processing and electronic mail systems once a recordkeeping copy has been made. In addition, it authorizes deletion of electronically generated administrative spreadsheets and other administrative records that are included in recordkeeping systems that have been authorized for disposal by NARA. Since most agency “recordkeeping systems” are paper files, GRS 20 essentially authorizes agencies to destroy E-mail and word-processing files once they are printed. As already noted, records not covered by a general records schedule may not be destroyed unless authorized by a records schedule that has been approved by NARA. GRS 20 does not address many common products of electronic information processing, particularly those that result from the now prevalent distributed, end-user computing environment. For example, although the guidance addresses the disposition of certain types of electronic records associated with large databases, it does not specifically address the disposition of electronic databases created by microcomputer users. In addition, while addressing word processing and E-mail records, GRS 20 does not address more recent forms of electronic records such as Web pages and portable document format (PDF) files. [Footnote 3]: NARA Archives Permanent Records of Historical Interest: As the nation’s archivist, NARA accepts for deposit to its archives those records of federal agencies, the Congress, the Architect of the Capitol, and the Supreme Court that are determined to have sufficient historical or other value to warrant their continued preservation by the U.S. government. NARA also accepts papers and other historical materials of the Presidents of the United States, documents from private sources that are appropriate for preservation (including electronic records, motion picture films, still pictures, and sound recordings), and records from agencies whose existence has been terminated, including Offices of Independent Counsel (see fig. 1). Figure 1: Removable Hard Drives and Backup Devices Used by Independent Counsel Staff: [See PDF for image] Source: NARA. [End of figure] NARA archives vast quantities of federal records in various formats. Its archival facilities (a network of regional archives) hold over 21 million cubic feet of original textual materials, while its multimedia collections include nearly 300,000 reels of motion picture film; more than 5 million maps, charts, and architectural drawings; over 200,000 sound and video recordings; about 9 million aerial photographs; nearly 14 million still pictures and posters; and over 87,000 computer data sets stored on computer tapes and cartridges (see fig. 2). Figure 2: Master Copies of Electronic Records in NARA’s Archives: [See PDF for image] Source: NARA. [End of figure] In addition to its archives, NARA also manages the archival holdings of 10 presidential libraries, the Nixon presidential materials staff, and the Clinton presidential materials project. These include over 400 million paper records, over 15 million feet of film, nearly 10 million still pictures, nearly 100,000 hours of audio and video recordings, and almost half a million museum objects. The types of electronic records that NARA currently accepts for archiving are limited to those that are independent of specified hardware or software and are in text-based formats, such as databases and certain text-based geographic information system (GIS)[Footnote 4] files. NARA does not accept digital images, Web pages, word processor files, relational databases, or any records with complex structure.[Footnote 5] (Although NARA does not as yet accept such files for archiving, they must still be scheduled.): Management and Preservation of Electronic Records Pose Major Challenges: During the last four decades, archiving--the permanent preservation of information of enduring value for access by future generations--has undergone a major change. Before the advent of large bureaucracies supported by the now ubiquitous computer, archivists dealt with a scarcity of sources, with much of their efforts focused on tracking down unique manuscripts or recovering incomplete files.[Footnote 6] The archived records were relatively durable--clay tablets, stone, parchment, vellum, or rag paper. Albeit scarce and often incomplete, these records come down through the centuries relatively intact and could be preserved with little or no difficulty. The growth of the government, complex organizations, and advent of the electronic age have reversed the conditions facing today’s archives: rather than dealing with scarce sources, the archives are facing a flood of potentially valuable information stored on fragile materials, including pulp paper and computer tapes and disks. While the preservation of information recorded on traditional materials such as paper or film requires significant resources, the current major archival challenge is the preservation of electronic records. Like traditional archival materials--books, papers, or film--electronic information is recorded on media that deteriorate with age. However, unlike the traditional archival materials, electronic records are stored in specific formats and cannot be read without software and hardware--sometimes the specific types of hardware and software on which they were created. The rapid evolution of information technology makes the task of managing and preserving electronic records complex and costly. Agencies are increasingly moving to an operational environment in which electronic--rather than paper--records provide comprehensive documentation of their activities and business processes. Part of the challenge of managing electronic records is that they are produced by a mix of information systems, which vary not only by type but by generation of technology: the mainframe, the personal computer, and the Internet. Each generation of technology brought in new systems and capabilities without displacing the older systems.[Footnote 7] Thus, organizations have to manage and preserve electronic records associated with a wide range of systems, technologies, and formats. The challenge of managing and preserving vast and rapidly growing volumes of electronic records produced by modern organizations is placing pressure on the archival community and on the information industry to develop a cost-effective long-term preservation strategy that would free electronic records of the straitjacket of proprietary file formats and software and hardware dependencies. This challenge is affected by several factors: decentralization of the computing environment, the complexity of electronic records, obsolescence and aging of storage media, massive volumes of electronic records, and software and hardware dependencies. * Decentralization of computing environment: The challenge of managing electronic records significantly increases with the decentralization of the computing environment. In the centralized environment of a mainframe computer, it is relatively easy to identify, assess, and manage electronic records. This is not the case in the decentralized environment of agencies’ office automation systems, where every user is creating electronic files that may constitute a formal record and thus should be preserved. * Complexity of electronic records: Electronic records have evolved from simple text-based files to complex digital objects that may contain embedded images (still and moving), drawings, sounds, hyperlinks, or spreadsheets with computational formulas. Some portions of electronic records, such as the content of dynamic Web pages, are created on the fly from databases and exist only during the viewing session. Others, such as E-mail, may contain multiple attachments, and they may be threaded (that is, related E-mail messages are linked into send-reply chains). These records cannot be converted to paper or text formats without the loss of context, functionality, and information. * Obsolescence and aging of storage media: Storage media are affected by the dual problems of obsolescence and decay. They are fragile, have limited shelf life, and become obsolete in a few years. Few computers today have disk drives that can read information stored on 8-or 5¼-inch diskettes, even if the diskettes themselves remain readable. * Massive volumes: Electronic records are increasingly being created in volumes that pose significant technical challenge to our ability to organize and make them accessible. For example, among the candidates for archiving are military intelligence records comprising more than 1 billion electronic messages, reports, cables, and memorandums, as well as over 50 million electronic court case files. * Software and hardware dependency: Electronic records are created on computers with software ranging from word-processors to E-mail programs. As computer hardware and application software become obsolete, they may leave behind electronic records that cannot be read without the original hardware and software. Past GAO Work Highlighted Electronic Records Challenges: In July 1999, we reported that NARA and federal agencies were facing the substantial challenge of preserving electronic records in an era of rapidly changing technology.[Footnote 8] In that report we stated that in addition to handling the burgeoning volume of electronic records, NARA and the agencies would have to address several hardware and software issues to ensure that electronic records were properly created, maintained, secured, and retrievable in the future. We also noted that NARA did not have governmentwide data on the records management capabilities and programs of all federal agencies. As a result, we recommended that NARA conduct a governmentwide survey of agencies’ electronic records management programs and use the information as input to its efforts to reengineer its business processes. NARA’s subsequent efforts to assess governmentwide records management practices and study the redesign of its business processes are discussed later in this report. Agencies Are Beginning to Automate Management of Electronic Records: In response to the difficulty of manually managing electronic records, agencies are slowly turning to automated records management applications to help automate electronic records management life-cycle processes. The primary functions of these applications include categorizing and locating records and identifying records that are due for disposition, as well as storing, retrieving, and disposing of electronic records that are maintained in repositories. Also, some applications are beginning to be designed to automatically classify electronic records and assign them to an appropriate records retention and disposition category. The Department of Defense (DOD), which is pioneering the assessment and use of records management applications, has published application standards and established a certification program.[Footnote 9] The DOD standard, endorsed by NARA, includes the requirement that records management applications acquired by DOD components after 1999 be certified to meet this standard.[Footnote 10] As of March 2002, DOD had certified 31 applications. NARA was testing one of the DOD-certified electronic records management applications, and it will be assessing the second version of the DOD standard to determine whether it can or should become a governmentwide standard. Theory, Methods, and Model for Long-Term Preservation of Electronic Records Are Being Developed: NARA is not alone in facing the challenges posed by electronic records, particularly long-term preservation. There is a general consensus in the archival community that a viable strategy for the long-term preservation and archiving of electronic records has yet to be developed. Accordingly, archives scholars, national archival and library institutions, and private industry representatives are collaborating on major initiatives to develop the theoretical and methodological knowledge needed for the permanent preservation of records created in electronic systems. These initiatives include the following: * The International Research on Permanent Authentic Records in Electronic Systems project is a major two-phase international research project in which archival and computer engineering scholars, national archival institutions (including NARA), and private industry representatives are collaborating to develop the theoretical and methodological knowledge required for the permanent preservation of authentic records created in electronic systems. The first phase of the project, focusing on records generated in databases and document management systems, was recently completed; the second phase (2002 to 2006) deals with the issues of authenticity, reliability, and accuracy of records produced in new digital environments. * The Library of Congress’ National Digital Information Infrastructure and Preservation Program is a national cooperative effort led by the Library to develop the strategy and technical approaches needed to archive and preserve digital information; NARA is also participating in this effort. The program is in an early stage; completion is not expected until 2004 or 2005, when the Library will provide recommendations to the Congress. * NARA is collaborating in a joint effort on electronic record archiving with the Defense Advanced Research Projects Agency (DARPA), the U.S. Patent and Trademark Office, the National Partnership for Advanced Computational Infrastructure, and the San Diego Supercomputer Center. Led by DARPA, the collaboration aims to develop and demonstrate architectures and technologies for electronic archiving and the development of persistent object preservation, a proposed technique for electronic archiving (discussed in app. II). These initiatives are all in their early stages; none of them has yet yielded proof-of-concept prototypes demonstrating the viability of a long-term solution to preserving and accessing electronic records. Progress has been made, however, in the development of a standard model for electronic archiving systems. The Open Archival Information System (OAIS) model, which is currently emerging as a standard in the archival community, was initially developed by the National Aeronautics and Space Administration (NASA) for archiving the large volumes of data produced by space missions. However, the model is applicable to any archive, digital library, or repository. As a standard framework for long-term preservation archives, the model defines the environment necessary to support a digital repository and the interactions within that environment. According to NASA, it also promotes the understanding and increased awareness of archival concepts needed for long-term digital information preservation and access, as well as for describing and comparing architectures and operations of existing and future archives. Many institutions have already chosen to use the framework of the OAIS reference model to guide their digital preservation efforts, including the National Library of the Netherlands, NARA (in conjunction with the development of its electronic records archiving project), NASA’s National Space Science Data Center, and many commercial organizations. The OAIS model (see fig. 3) breaks the archiving system down into six distinct functional areas: ingest, archival storage, data management, administration, preservation planning, and access. * In the ingest area, systems accept information submitted from outside the framework and prepare the contents for storage. This functional area also includes systems to generate descriptive information to allow future management within the archive. * In the archival storage area, systems pass the information, now called archival information packages, into a storage repository, where it is maintained until the contents are requested and retrieved. * The data management area encompasses the services and functions for populating, maintaining, and accessing both descriptive information that identifies and documents archive holdings and administrative data used to manage the archive. * The administration area provides the services and functions for the overall operation of the archive system. * In the preservation planning area, systems monitor the environment of the OAIS and provide recommendations to ensure that the information stored in the OAIS remains accessible, even if the original computing environment becomes obsolete. * The access area includes systems that allow a user to determine the existence, description, location, and availability of information stored in the OAIS, allowing information products to be requested and received. Figure 3: OAIS Model and Its Components: [See PDF for image] Source: Consultative Committee for Space Data Systems. [End of figure] The OAIS framework does not presume or apply any particular preservation strategy. This approach allows organizations that adopt the framework to apply their own strategies or combinations of strategies. The framework does assume that the information managed is produced outside the OAIS, and that the information will be disseminated to users who are also outside the system. Because the model is simplified to include only functions common to all repositories, it allows institutions to focus on the approaches necessary to preserve the information. NARA Is Responding to Challenges of Electronic Records Management: NARA is taking action to respond to long-standing problems associated with managing and preserving electronic records in archives. In 2001, NARA completed an assessment of governmentwide records management practices. This assessment concluded that although agencies are creating sufficient records and maintaining them appropriately, most electronic records remain unscheduled, and permanent records of historical value are not being identified and provided to NARA for preservation and archiving. As a result, potentially valuable records may be at risk. According to the study, the problems in electronic records management appear to stem from (1) inadequate governmentwide records management guidance and (2) the low priority traditionally given to federal records management functions and a lack of technology tools to manage electronic records. To address these problems, NARA now plans to (1) analyze key policy issues related to the disposition of records and improve its guidance and (2) examine and redesign, if necessary, the scheduling and appraisal process and make this process more effective through the use of technology. NARA’s plans, however, do not address the low priority given to records functions. Further, these plans do not address the need to monitor performance of records management programs and practices on an ongoing basis. NARA’s Assessment of Federal Records Practices Identifies Problems: Records must be effectively managed throughout their life cycle, which includes records creation, maintenance and use, and scheduling and disposition. Agencies must create reliable records that meet the business needs and legal responsibilities of federal programs and (to the extent known) the needs of internal and external stakeholders who may make secondary use of the records. To maintain and use the records created, agencies are to create internal recordkeeping requirements for maintaining records, consistently apply these requirements, and establish systems that allow them to find records that they need. Scheduling is the means by which NARA and agencies identify federal records, determine time frames for disposition, and identify permanent records of historical value that are to be transferred to NARA for preservation and archiving. With regard particularly to electronic records, agencies are also to compile inventories of their information systems, after which the agency is required to develop a schedule for the electronic records maintained in those systems. In 2001, NARA completed an assessment of governmentwide records management practices, as recommended in our prior work. The assessment included a recordkeeping study performed by a contractor--SRA International--and a series of records system analyses performed by NARA staff. The SRA study was based on a survey of federal employees representing over 150 federal government organizations and on 54 focus groups and interviews involving individuals from 18 agencies; the NARA staff’s records system analyses focused on records management practices for key business processes in 11 federal agencies. The resulting NARA/SRA study identified problems in agency records management.[Footnote 11] Specifically, NARA’s assessment of records management for key processes in 11 agencies concluded the following. * Records creation: In general, the NARA study showed that the processes that were studied appeared to generate adequate records documentation. * Records maintenance and use: For the most part, recordkeeping requirements were adequate, documented, and consistently applied. In addition, employees were generally able to find the records that they needed. * Records scheduling and disposition: The study identified significant problems in both records scheduling and disposition. According to the study, many significant records--as well as most federal electronic records--are unscheduled. In addition to the unscheduled records, NARA identified several significant records that had been improperly scheduled. The study concluded that records scheduling was clearly a problem area. Our review at four agencies (Commerce, Housing and Urban Development, Veterans Affairs, and State) provides confirmation of this result, eliciting a collective estimate that less than 10 percent of mission- critical systems were inventoried. The number of mission-critical systems at these four agencies was reported to be 907, according to information collected by the Office of Management and Budget in November 1999 as part of the federal government’s effort to assess the Year 2000 computing challenge.[Footnote 12] Thus for these four agencies alone, over 800 systems had not been inventoried and the electronic records maintained in them had not been scheduled. Scheduling the electronic records in a large number of major information systems presents an enormous challenge, particularly since it generally takes NARA, in conjunction with agencies, well over 6 months to approve a new schedule.[Footnote 13] Failure to inventory systems and schedule records places these records at risk. The absence of inventories and schedules means that NARA and agencies have not examined the contents of these information systems to identify official government records, appraised the value of these records, determined appropriate disposition, and directed and trained employees in how to maintain and when and how to dispose of these records. As a result, temporary records may remain on hard drives and other media long after they are needed or could be moved to less costly forms of storage. In addition, there is increased risk that these records may be deleted prematurely while still needed for fiscal, legal, and administrative purposes. The lack of scheduling presents particular risks to the preservation of permanent records of historic significance. NARA’s study of 11 agencies found instances where valuable permanent electronic records were not being appropriately transferred to NARA’s archives because these records had not been scheduled, appraised, identified as permanent, and placed under the control of the agency’s records program. This lack of management control places these valuable records at increased risk of loss, destruction, and deterioration. NARA’s Records Management Guidance Has Not Kept Pace with the Challenges of Electronic Records: The NARA/SRA study identified the lack of sufficient governmentwide guidance as one cause of records management problems. As NARA has acknowledged, its policies and processes on electronic records have not yet evolved to reflect the modern recordkeeping environment: records created electronically in decentralized processes.[Footnote 14] Despite repeated attempts to clarify its electronic records guidance through a succession of NARA bulletins, the current guidance remains incomplete and confusing. According to the study, for example, employees lack knowledge concerning how to identify electronic records and what to do with them once identified. The guidance does not provide disposition instructions for electronic records maintained in many of the common types of formats produced by federal agencies, including PDF files, Web pages, and spreadsheets. To support their missions, many agencies must maintain such records--often in large volumes--with little guidance from NARA (see app. IV for a discussion of the records management challenges faced by selected agencies). The NARA/SRA study concluded that while agencies appreciate the specific assistance from NARA personnel, they are frustrated because they perceive that NARA is not meeting agencies’ broader needs for guidance and records management leadership. This study reported that agencies believe that NARA has a responsibility to lead the way in transitioning to an electronic records environment and to provide guidance and standards, as well as tools to enable agencies to follow the guidance. According to the study, some viewed NARA as leaving agencies to fend for themselves, sometimes levying impossible requirements that pressure agencies to come up with their own individual solutions. Agency Records Management Programs Are Given Low Priority and Lack Technology Tools: The NARA/SRA study identified another cause of records management difficulties: the low priority generally afforded to records management programs. The study states that records management is not even “on the radar scope” of agency leaders. Further, records officers have little clout and do not appear to have much involvement in or influence on programmatic business processes or the development of information systems designed to support them. New government employees seldom receive any formal, initial records management training. One agency told NARA that records management is “number 26 on our list of top 25 priorities.” The study also noted that federal downsizing may have negatively affected records management and staffing resources in agencies. Further, records management is generally considered a “support” activity. Since support functions are typically the most dispensable in agencies, resources for and focus on these functions are often limited. This finding was echoed by a recent review of archival practices of research universities, corporate research and development programs, and federal science agencies, which noted that “agency records management programs lack the resources to meet even the legally required standards of securing adequate documentation of their programs and activities.” [Footnote 15]: As indicated by the NARA/SRA study, a related issue is the technical challenge of electronic records management: effective electronic records management may require more sophisticated and expensive information technology (such as automated electronic records management systems) than was previously necessary for paper-based records management programs. Because management tends not to focus on records management, priority has not been given to acquiring or upgrading the technology required to manage records in an electronic environment. The study noted that technology tools for managing electronic records do not exist in most agencies, and further, that agency information technology environments have not been designed to facilitate the retention and retrieval of electronic records. As a result, despite the growth of electronic media, agency records systems are predominantly in paper format rather than electronic. The study further noted that agencies planning or piloting automated electronic records management systems perform better recordkeeping than those without such tools. Typically, such agencies are already performing better recordkeeping, and they tend to invest in electronic records management systems because of the value they place on good records management. According to the study, many agencies are either planning or piloting information technology initiatives to support electronic records management, but their movement to electronic systems is constrained by the level of financial support provided for records management. Inspections of Federal Electronic Records Programs Are Limited: A possible further cause of agency records management problems, not addressed in the NARA/SRA study, is the limited nature of NARA’s current inspection program. NARA is responsible, under the Federal Records Act, for conducting inspections or surveys of agency records and records management programs and practices. Its implementing regulations require NARA to select agencies to be inspected (1) on the basis of perceived need by NARA, (2) by specific request by the agency, or (3) on the basis of a compliance monitoring cycle developed by NARA. [Footnote 16] In all instances, NARA is to determine the scope of the inspection. Such inspections provide not only the means to assess and improve individual agency records management programs but also the opportunity for NARA to determine overall progress in improving agency records management and identify problem areas that need to be addressed in its guidance. Between 1996 and 2000, NARA performed 16 inspections of agency records management programs, or about 3 per year. These reviews were systematic and comprehensive, covering all aspects of an agency’s records program. However, only 2 of the 24 major executive departments or agencies were evaluated, with most of NARA’s evaluations focused on component organizations or independent agencies. Moreover, these evaluations frequently bypassed the issue of electronic records. In 2000, NARA replaced agency evaluations with a new inspection approach--targeted assistance. NARA decided that its previous approach to inspections was basically flawed: besides reaching only a few agencies, it was often perceived negatively by agencies and resulted in a list of records management problems that agencies then had to resolve on their own. Under the targeted assistance approach, NARA enters into partnerships with federal agencies to provide them with guidance, assistance, or training in any area of records management. Services offered include expedited review of critical schedules, tailored training, and help in records disposition and transfer. However, although this approach may improve records management in the targeted agencies, it is not a substitute for systematic inspections and evaluations of federal records programs. Because the targeted assistance program is voluntary and, according to NARA, initiated by a written request from the agency, relying on it exclusively could significantly limit NARA’s evaluations of federal recordkeeping. First, only agencies requesting targeted assistance--presumably those already having greater appreciation of the importance of records management-- are evaluated. Second, the scope and the focus of the targeted assistance are not determined by NARA but by the requesting agency. NARA Is Addressing Records Management Problems, but Additional Opportunities Exist: NARA has recognized that its policy and regulations for the management and disposition of electronic records must be revised to provide agencies with clear and comprehensive guidance encompassing all types and formats of electronic records. Having completed its assessment of federal records management practices, NARA now plan a two-phase project to (1) analyze key policy issues related to the disposition of records and improve governmentwide guidance, and (2) examine and redesign, if necessary, the scheduling and appraisal process and make this process more effective through the use of technology. According to NARA, the purpose of the first phase of the project is to analyze and make decisions, as necessary, on key policy issues related to determining the disposition of records. NARA plans to evaluate current legislation, regulations, and guidance to determine if these are adequate in the current recordkeeping environment. NARA expects the outcome of the first phase, scheduled for completion by the end of fiscal year 2002, to be policy decisions that support the appropriate disposition of all government documentation in today’s multimedia environment.[Footnote 17] These results are also intended, as recommended in our prior work, to inform the redesign of the current scheduling and appraisal process planned for the second phase of the project, the development of electronic recordkeeping requirements, and improvements to records management guidance and assistance to agencies. In the second phase, NARA plans to examine and redesign, if necessary, the process used by the federal government to determine the disposition of records. This is planned as a multiyear process (2003 to 2006) during which NARA intends to address the scheduling and appraisal of federal records in all formats. Currently, it takes NARA well over 6 months to approve a new schedule. According to NARA, the extensive appraisal time delays action on the disposition of records and discourages agencies from submitting schedules, potentially putting essential evidence at risk. NARA has two goals for this project: (1) making the process for determining the disposition of records, regardless of medium, more effective and efficient and dramatically decreasing the amount of time it takes to get approval for the disposition of records from the Archivist of the United States, and (2) deciding how to appropriately apply technology to support the revised process for determining the disposition of records as part of managing records throughout their life cycle. Although NARA’s plans address the need to improve guidance and determine how to use technology to support records management, these plans do not address another issue raised in its study: the low priority generally given to records management and the related lack of management commitment and attention to these functions. Without a strategy to establish senior-level agency commitment to records management and raise awareness of its importance to the federal government, these programs are likely to continue to be regarded by agency management and employees as low-priority “support” functions. In addition, NARA’s plans do not address the issue of systematic inspections. While the results of its recent study provide a baseline of governmentwide records management practices, NARA’s targeted assistance approach does not provide systematic and comprehensive information to assess progress over time. Without this type of data, NARA will be impaired in its ability to determine if it is achieving results in improving agency records management. Further, NARA may not have the means to identify agency implementation issues and areas where its guidance needs to be clarified, augmented, and strengthened. The feedback provided by inspection is especially critical now as NARA plans to redesign the scheduling and appraisal process, and improve its guidance. NARA’s Effort to Acquire Advanced Electronic Archival System Faces Risks: Archiving--the final phase of records management for permanent records- -presents a significant challenge when records are electronic. In light of the growth in the volume, complexity, and diversity of electronic records, NARA has recognized that its technical strategies to support preservation, management, and sustained access to electronic records are inadequate and inefficient. To address this challenge, the agency is pursuing two strategies. Its short-term strategy is to extend the useful life of its current systems and to create some new systems for archiving electronic records and for cataloging and displaying electronic records on-line. NARA’s long-term strategy, on which it is placing its primary focus, is to contract with a private sector firm to acquire (that is, obtain) an advanced electronic records archive (ERA). However, NARA faces substantial risks in implementing its long-term strategy. NARA is not meeting its schedule for the ERA system, largely because of flaws in how the schedule was developed. As a result, the schedule will be compressed, increasing risks. Further, although NARA recognizes that to be successful it must improve its information technology (IT) management capabilities and has made progress in doing so, these efforts are not yet complete. NARA Is Planning to Acquire an Advanced Electronic Records Archiving System: NARA’s long-term strategic initiative is to develop an advanced electronic records archive. The agency’s goals for this system are to preserve and provide access to any kind of electronic record, free from dependency on any specific hardware or software, so that the agency can carry out its mission into the future. Although the new archival system is not yet formally defined, agency documents, public presentations, and interviews with agency officials and staff indicate, in broad outline, how they envision this system. It will probably be a distributed system, allowing the storage and management of massive record collections at a variety of installations, with accessibility provided via the Internet. It may be based on persistent object preservation, an advanced form of file format conversion and encapsulation (described in app. II) that is the subject of research sponsored by NARA and other organizations. A leading candidate for performing this encapsulation and capturing the necessary information is the Extensible Markup Language (XML), which provides a means for “tagging” (annotating) information in a meaningful fashion that can be readily interpreted by disparate computer systems (XML is further discussed in app. II). NARA has indicated that ERA will be a major system, and that it is likely that it will be developed and implemented in several phases (or “builds”), with each phase adding more functions to the system. According to NARA, its development will take several years, and it will involve a significant expenditure of resources on program management, research, and systems development activities. NARA is planning to award the contract for the new electronic archival system in January 2004. Table 1 is a timeline showing key tasks for the program. Table 1: Timeline for ERA Program: Key ERA tasks: Develop vision statement; Completion dates: March 1, 2002[ A]. Key ERA tasks: Develop concept of operations; Completion dates: April 1, 2002[ B]. Key ERA tasks: Conduct market survey; Completion dates: June 28, 2002. Key ERA tasks: Perform analysis of alternatives; Completion dates: July 22, 2002. Key ERA tasks: Develop cost estimates; Completion dates: August 19, 2002. Key ERA tasks: Develop high-level conceptual and functional requirements; Completion dates: September 24, 2002. Key ERA tasks: Develop business case/economic analysis; Completion dates: September 30, 2002. Key ERA tasks: Develop final functional requirements; Completion dates: December 2, 2002. Key ERA tasks: Issue Request for Information; Completion dates: January 13, 2003. Key ERA tasks: Release Request for Proposal; Completion dates: August 4, 2003. Key ERA tasks: Fiscal year 2004 budget for ERA In effect; Completion dates: October 1, 2003. Key ERA tasks: Award ERA contract; Completion dates: January 12, 2004. [A] Completed April 18, 2002. [B] Completed in draft on April 1, 2002. [End of table] To assist in this effort, NARA contracted with Integrated Computer Engineering (ICE), Incorporated,[Footnote 18] a private company experienced in systems development and acquisition. With the assistance of this contractor, NARA has been establishing the ERA program management office. Since July 2001, the program management office has been focused on developing the capability to manage the development and acquisition of the ERA system. NARA is also funding two independent assessments of the research into the technology that is proposed for ERA. These two independent assessments, conducted by the National Academy of Sciences, will review research that NARA is now sponsoring, as well as alternative approaches. The first assessment is a technical review of the viability of persistent object preservation, the architecture for persistent archives of electronic records that is being researched by the National Partnership for Advanced Computational Infrastructure (see app. II). This assessment--scheduled for completion on January 31, 2003--will address the adequacy and soundness of the persistent object preservation architecture as a whole, as well as its major components, from the points of view of computer science, systems engineering, and archival sciences. NARA has stated that the assessment of the persistent object information management architecture and its technical validation should be completed before ERA is developed. In its fiscal year 2002 budget hearings, NARA referred to the articulation of the persistent object preservation architecture as the one “major dependency” in its strategy for acquiring an ERA system. The second assessment will identify and evaluate alternative methods for digital preservation of records, examine the operational use of the Internet for digital archiving, and identify those aspects of the preservation of electronic records that cannot be adequately addressed either by state-of-the-art information technology or by technologies under development. It will also address the feasibility of commercializing new ideas from research. According to NARA, the second assessment is to be completed 6 to 9 months after the first. ERA Schedule Faces Significant Risks: Although the ERA project is still in its initial stages, it is already falling behind schedule. As shown in table 1, the initial deliverables for design and acquisition are late: the vision statement, due March 1, was not completed until April 18, and the concept of operations,[Footnote 19] due April 1, was delivered in draft form on that date and had not been finalized as of May 31. This lateness can be attributed to flaws in how the schedule was developed. In its tracking of ERA risks, NARA has acknowledged that the schedule for completion of tasks was based on incomplete work projections, and that its deadlines may not be achievable. Rather than constructing a plan based on estimates of the amount of work and resources required to complete each task, NARA constructed a “success oriented” schedule that was planned around ensuring that ERA was funded beginning in fiscal year 2004. In addition, the ERA program management office is behind schedule on its efforts to develop the plans and guidance to strengthen its capability for managing the acquisition and deployment of ERA. In July 2001, with the help of its systems development and acquisition contractor, the office began focusing on developing these plans and procedures. We tracked planned and actual completion dates for 13 policy and planning documents that the program management office needs in order to develop and acquire a major system (according to NARA and its contractor). To date, however, only 7 of the 13 documents have been completed.[Footnote 20] The 7 that have been delivered were late by an average of over 2 months. The initially planned delivery dates of the other 6 documents have passed; on average these are late by almost 4 months.[Footnote 21] Besides the approach taken to constructing the schedule, another contribution to schedule slippage may be NARA’s slow start in hiring full-time government staff for the ERA program management office. For fiscal year 2002, NARA was authorized 16 positions for the ERA program office. However, as of April 2002, NARA had only 5 full-time staff on board. NARA Is Strengthening IT Management Capabilities, but These Efforts Are Incomplete: Acquiring a major IT system such as the planned electronic archival system is a significant challenge for a relatively small organization like NARA, whose IT management capabilities are relatively limited. In its fiscal year 2002 budget hearings, NARA indicated that it must strengthen its IT management capabilities and infrastructure to support the ERA program, and NARA is currently taking steps to do so in three key areas: IT investment management, enterprise architecture, and information security. None of these efforts, however, is yet complete. Sound IT Management Capabilities Contribute to Success in Acquiring IT Systems: IT investment management provides a systematic method for agencies to minimize risks while maximizing the return on investments. The Clinger- Cohen Act requires agency heads to implement a process for maximizing the value and assessing and managing the risks of an agency’s IT investments. Our research of leading private and public sector organizations’ IT management practices indicates that effective investment management requires the use of defined and disciplined investment management processes. An enterprise architecture provides a description--in useful models, diagrams, and narrative--of the mode of operation for an agency. It describes the agency in both (1) logical terms, such as interrelated business processes and business rules, information needs and flows, and work locations and users; and (2) technical terms, such as hardware, software, data, communications, and security attributes and standards. An enterprise architecture provides these perspectives both for the current environment and for the target environment, as well as a transition plan for sequencing from the current to the target environment. Managed properly, an enterprise architecture can clarify and help optimize the dependencies and relationships among an agency’s business operations and the underlying IT infrastructure and applications that support these operations. Information security is an important consideration for any organization that depends on information systems to carry out its mission. Our study of security management best practices, as summarized in our 1998 executive guide,[Footnote 22] found that leading organizations manage their information security risks through an ongoing cycle of risk management. This management process involves (1) establishing a centralized management function to coordinate the continuous cycle of activities while providing guidance and oversight for the security of the organization as a whole, (2) identifying and assessing risks to determine what security measures are needed, (3) establishing and implementing policies and procedures that meet those needs, (4) promoting security awareness so that users understand the risks and the related policies and procedures in place to mitigate those risks, and (5) instituting an ongoing monitoring program of tests and evaluations to ensure that policies and procedures are appropriate and effective. NARA Is Improving Its IT Investment Management Processes: The Clinger-Cohen Act of 1996 requires agencies to establish an IT investment process that provides the means for senior management to obtain timely information regarding the progress of investments in an information system, including a system of milestones for measuring progress in terms of cost, timeliness, quality, and the capability of the system to meet specified requirements. Weak IT investment management processes significantly increase the risk that agency funds and resources will not be efficiently expended. The first step toward establishing effective investment management is putting in place foundational, project-level control and selection processes. These foundational processes allow the agency to identify variances in project cost, schedule, and performance expectations; to take corrective action, if appropriate; and to make informed, project- specific selection decisions. The second major step toward effective investment management is to continually assess proposed and ongoing projects as an integrated and competing set of investment options. This portfolio management approach enables the organization to consider the relative costs, benefits, and risks of new and previously funded investments and thereby identify the mix that best meets its mission, strategies, and goals. NARA’s IT investment management policies and processes were assessed and reported on by its inspector general (IG) in April 2000. The report identified several strengths in NARA’s IT investment management processes, including having an IT investment board, a defined process for selecting projects, criteria to be applied in considering whether to undertake a particular IT investment, ratings of each investment’s breadth of impact, and a determination of the net benefits and risks be identified for proposed investments. However, the IG identified weakness and made 13 recommendations for strengthening NARA’s IT investment management processes. NARA concurred with all recommendations. While it has to date fully addressed only 2 of the recommendations, it plans to resolve the remaining 11 issues by September 30, 2002. While NARA’s investment management process has several strengths and NARA continues to improve process weaknesses, NARA has yet to complete its efforts to establish a mature investment management capability. Lacking a fully mature investment management process increases the risk that the electronic archival system will not be implemented on time and within budget, and that crucial resources and funds for meeting the electronic records challenges will not be invested effectively and efficiently. Specifically, if NARA management’s oversight of the ERA program is not based on complete information (including comparisons of the actual cost and schedule to the estimated cost and schedule, as well as identification of project risks and benefits), the risk is increased that NARA management will not be able to determine whether the ERA program is having schedule or other problems and ensure that corrective actions are taken. NARA Is Developing an Enterprise Architecture: The importance of enterprise architecture development, implementation, and maintenance is a basic tenet of effective IT management. Used in concert with other IT management controls, an enterprise architecture can greatly increase the chances for optimal mission performance. We have found that attempting to modernize operations and systems without an enterprise architecture leads to operational and systems duplication, lack of integration, and unnecessary expense. Over the past several years, NARA has taken action to develop an enterprise architecture. NARA has drafted a current architecture and is working on a target architecture, but this work is incomplete.[Footnote 23] However, the process to develop the electronic archival system is well under way. Without an enterprise architecture to guide its development, NARA increases the risk that the planned electronic archival system will be incompatible with existing and future operations and systems, thus wasting resources and requiring that unnecessary interfaces be built to achieve integration. NARA Is Improving Information Security, but Has Not Yet Completed Key Tasks: NARA is currently strengthening its information security, having recognized that it has numerous weaknesses. Significant security weaknesses were identified by two IG assessments (conducted in fiscal years 2000 and 2001) and a NARA-initiated vulnerability assessment of its network (performed concurrently with the IG assessments). As a result of these assessments, the Archivist of the United States declared information security a material weakness in fiscal year 2000.[Footnote 24] Actions taken by the Archivist to addresses these shortcomings and respond to recommendations identified in the reports include establishing an information security program, updating and developing new security policy documents, developing contingency plans and business recovery plans, and strengthening firewalls across the network to control inbound and outbound traffic. NARA said that it would implement the IG’s recommendations by June 28, 2002, and by the end of fiscal year 2002 it plans to have rectified the shortcomings that led to its information security being declared a material weakness. However, although NARA is making progress in strengthening its information security, two additional weaknesses could affect the ERA program. First, NARA currently lacks a program for assessing agencywide information security risks. Federal guidance requires all federal agencies to establish comprehensive information security programs based on assessing and managing risks.[Footnote 25] Risk assessments provide a basis for establishing appropriate policies and selecting cost- effective techniques to implement these policies. NARA intends to develop an agencywide risk assessment capability in fiscal year 2003, but it is not clear that this will allow vulnerability assessments to be completed before ERA is developed. Without a method to identify and evaluate risks, NARA cannot be assured that it has effective mechanisms for protecting its information assets: networks, systems, and information associated with ERA. Because a compromise of security in a single poorly secured system can undermine the security of multiple systems, NARA needs to complete vulnerability assessments of all systems that will interface with ERA. Second, because NARA lacks an enterprise architecture, it may have difficulty addressing agencywide security. Federal guidance calls for agencies to make security controls for systems consistent with and an integral part of the enterprise architecture of the agency.[Footnote 26] Without an enterprise architecture that addresses security issues agencywide, NARA cannot be sure that its current or future archiving systems are adequately protected. These weaknesses may be particularly significant for ERA, because this system presents security issues that NARA has never before addressed, according to an initial assessment report on ERA prepared by NARA’s systems development and acquisition contractor.[Footnote 27] The proposed distributed structure of ERA introduces the security risks associated with the Internet--threats to the integrity of data and to data accessibility. According to the Federal Bureau of Investigation, Internet systems are threatened by hackers (who may be terrorists, transnational criminals, and intelligence services) using information exploitation tools such as computer viruses, worms, Trojan horses, logic bombs, and eavesdropping sniffers.[Footnote 28] As Internet usage increases, the Internet has become an increasingly tempting target, and the number of reported Internet-related security incidents is growing.[Footnote 29] The effect on ERA of the vulnerabilities of the Internet would have to be assessed and addressed. Conclusions: In response to the challenges associated with managing and preserving electronic records, NARA has performed an assessment of governmentwide records management--an important first step that identified several problems, including the inadequacy of guidance on electronic records, the low priority generally given to records management, and the lack of technology tools to manage electronic records. While NARA has plans to improve its guidance and address the need for technology, it has not yet formulated a strategy to deal with the stature of records management programs across government. Further, it has no strategy for acquiring the kind of comprehensive information on records management that would be provided by systematic inspections and evaluations of federal records programs. Without such a strategy, records management will likely continue to be considered a low-priority “support” activity lacking appropriate management attention, and NARA will not acquire information needed to address problems in agency records management and guidance. Inadequacies in records management put at risk records that may be valuable: records providing information on essential government functions, information that is necessary to protect government and citizen interests, and information that is significant for the historical record. NARA’s effort to acquire an advanced electronic records archive is at risk. NARA is not meeting its schedule for the ERA system, largely because of flaws in how the schedule was developed. As a result, the schedule will be compressed, leaving less time for completing essential planning tasks. In addition, NARA has not yet improved IT management capabilities that would reduce the risks inherent in its effort to acquire ERA. Without these capabilities, NARA risks spending funds to acquire a system that does not meet mission needs and requirements, effectively work with existing systems, or provide adequate security over the information it contains. Recommendations for Executive Action: To address the low priority given to records management programs across government, we recommend that the Archivist of the United States develop a documented strategy for raising agency senior management awareness of and commitment to records management principles, functions, and programs. Further, we recommend that the Archivist develop a documented strategy for conducting systematic inspections of agency records management programs to (1) periodically assess agency progress in improving records management programs and (2) evaluate the efficacy of NARA’s governmentwide guidance. To mitigate the risks associated with the acquisition of an advanced electronic archival system, we recommend that the Archivist reassess the ERA project schedule. A revised schedule should be developed, based on estimates of the amount of work and resources required to complete each task, that allows sufficient time for NARA to: * complete essential planning tasks and: * strengthen its IT management capabilities by (1) implementing an IT investment management process, (2) developing an enterprise architecture, and (3) improving information security. Agency Comments and Our Evaluation: In written comments on a draft of this report, which are reprinted in appendix V, the Archivist of the United States generally agreed with our recommendations but provided clarifications concerning records management priority, inspections, and the ERA schedule. NARA also provided technical comments, which we have incorporated as appropriate. The Archivist agreed with our recommendation that NARA develop a strategy for raising agency senior management awareness of and commitment to records management principles, functions, and programs, adding that the responsibility for oversight of records management is not NARA’s alone, but is shared by the Office of Management and Budget (OMB), the General Services Administration (GSA), and the heads of federal agencies. Further, he acknowledged that more needs to be done to have a major effect on agency leadership. The Archivist, however, disagreed with our conclusion that NARA does not plan to address the low priority generally given to records management. Our conclusion was not meant to imply that NARA does not intend to address the priority of records management. We acknowledge NARA’s past efforts to raise awareness of the importance of records management and its stated plans to further address this issue. Instead, our conclusion reflects the fact that NARA’s written plan to reform federal records management policies and practices--which NARA refers to as its Records Management Initiatives--does not currently address this issue. We believe that to be successful, NARA must document its plans to address the low priority of records management programs across government, including specific goals, strategies, and milestones. Such a plan is critical in ensuring concurrence on planned actions among the key players that NARA mentions, including federal agencies, GSA, and OMB; that appropriate resources are assigned; and that NARA has the means to track progress against its goals. The Archivist also agreed with our recommendation that NARA develop a strategy for conducting systematic inspections of agency records management program, but noted that continuing its past inspection program, as cited in the report, would not succeed. NARA disagreed with our conclusion that it has no plans to address the issue of records management inspections, noting that it plans to use risk management analysis while leveraging its inspection resources. The Archivist said that this approach would include an assessment of broad categories of important records across agencies, agency-specific interventions, and the use of NARA’s authority to report the results of evaluations of at- risk records to OMB and the Congress. We are not suggesting that NARA resurrect its past inspection program, which it concluded was basically flawed. However, we also do not believe that NARA’s current targeted assistance approach is an appropriate substitute for systematic inspections and evaluations of federal records programs. In regard to our conclusion, it is again based on the fact that the written strategy for the Records Management Initiatives does not address the need for systematic inspections. We acknowledge NARA’s statement that it plans to use a risk-based approach to addressing this issue, but we reiterate the need for a documented plan with associated goals, strategies, and milestones. In commenting on our recommendation that NARA reassess the ERA project schedule, the Archivist stated that such a reassessment is prudent and that NARA intends to conduct such reassessments repeatedly, both periodically from an overall program management viewpoint and on a continuing basis as part of its ERA risk management activity. The Archivist noted that NARA is currently reassessing the schedule as part of its refinement of the ERA acquisition strategy, and that this reassessment will address the issues raised in our report. Regarding the schedule for the ERA system, the Archivist noted that while some program documentation was not completed on schedule, all items on the ERA project’s “critical path” have been completed on time, and NARA expects to meet all milestones on the critical path this year. We disagree. As discussed in our report, the development of key program documents--such as the ERA vision statement and the concept of operations--were affected by delays. For example, the ERA vision statement, planned for completion on March 1, 2002, was not completed until April 18, 2002, approximately 6 weeks late. Similarly, the concept of operations, due on April 1, 2002, and which NARA documentation shows as being on the critical path, was delivered in draft form on that date and had not been finalized as of May 31. Falling behind schedule in the initial stages presents risks to successful and timely completion of the ERA project and is one of the reasons we are recommending that the agency reassess its schedule. The Archivist also disagreed with our conclusion that if the results of the two National Academy of Sciences assessments are not fully reflected in the ERA requirements, there is added risk that the technical strategy underlying the development of the system will prove not to be optimal, and that alternatives will not have been considered. The Archivist noted that NARA should receive the first National Academy of Sciences report at a time when it expects to receive the industry’s response to NARA’s request for information, and that the report will provide an unbiased, expert view of the feasibility of building a system that is inherently evolutionary, addressing the core problem of digital preservation. According to the Archivist, NARA will factor both the scientific and the industry views into its articulation of a draft request for proposals. In regard to the second National Academy of Sciences report, the Archivist noted that its primary purpose is to provide input to NARA’s long-range plans for addressing the continuing evolution of information technology and electronic records, and that the report will be useful in revising the ERA research plan to address new problems and opportunities identified by the experts, and in plans for successive builds of the ERA system. We acknowledge NARA’s clarification regarding the timing and use of the two NAS studies and believe this approach should assist in developing a system that will meet mission needs. Accordingly, we have revised our recommendation to reflect this. We are sending copies of this report to the Ranking Minority Member, Subcommittee on Government Efficiency, Financial Management and Intergovernmental Relations, House Committee on Government Reform, and to the Ranking Minority Member, Subcommittee on Treasury, Postal Service and General Government, House Committee on Appropriations. We are also sending copies to the Archivist of the United States, the Secretary of Housing and Urban Development, the Secretary of State, the Secretary of Commerce, the Secretary of Veterans Affairs, and the Administrator of NASA. This report will also be available on GAO’s home page at http://www.gao.gov. If you have any questions concerning this report, please call me at (202) 512-6240 or Mirko J. Dolak, Assistant Director, at (202) 512- 6362. We can also be reached by E-mail at koontzl@gao.gov and dolakm@gao.gov, respectively. Key contributors to this report were Timothy Case, Barbara Collier, Jamey Collins, David Plocher, and Megan Savage. Linda D. Koontz Director, Information Management Issues: Signed by Linda D Koontz: Appendix I: Objectives, Scope, and Methodology: Our objectives were to: * determine the status of NARA’s efforts to respond to governmentwide electronic records management problems and the adequacy of its future plans and: * assess NARA’s efforts to acquire an archival system for electronic records. As part of our assessment of NARA’s efforts to acquire an electronic records archiving system, we were also asked to identify alternative technologies under consideration for the long-term preservation of electronic records. To determine the status of NARA’s efforts to assess and respond to governmentwide electronic records management problems and the adequacy of its future plans, we reviewed federal legislation and NARA records management guidance, available studies, and reports; surveyed NARA’s appraisal archivists working with federal agencies; reviewed records management activities and obtained the views of record managers in selected federal agencies managing large volumes of electronic records- -the Departments of State, Commerce, Housing and Urban Development (HUD), and Veterans Affairs (VA), as well as NASA and the Patent and Trademark Office; and reviewed legal challenges to federal electronic recordkeeping practices, including Public Citizen v. John Carlin and Scott Armstrong v. Executive Office of the President. We also reviewed NARA’s documentation of its effort to redesign its approach and guidance for the management of electronic records. As part of this effort, we investigated whether agencies are scheduling their major information systems and the related databases; to do so, we asked five major agencies--Commerce, HUD, VA, State, and NASA--what portion of their major information systems were scheduled and placed under the agency records management program. We based our assessment on the inventory of Year 2000 mission-critical systems reported by 24 major agencies to the Office of Management and Budget.[Footnote 30] In addition, to determine the status of the Library of Congress’ National Digital Information Infrastructure and Preservation Program and its relationship to NARA’s efforts to design and acquire advanced electronic archival system, we discussed the program’s objectives and schedule with Library of Congress officials. To assess NARA’s efforts to acquire an archival system for electronic records, we reviewed agency and contractors’ documentation for the electronic records archive (ERA) program, including program and project phasing; on the basis of federal requirements and information industry practice, we assessed NARA’s effort to develop or enhance its information technology capabilities, including information technology investment management, enterprise architecture, and information security. To identify alternative technologies under consideration for the long- term preservation of electronic records, we reviewed archival studies and literature, and we surveyed selected digital preservation approaches used by the information industry and selected national governments. In addition, we contacted the archives of three judgmentally selected foreign countries (Australia, Canada, and the United Kingdom) that had been identified by records management professionals as using advanced electronic records management and that we had previously reviewed.[Footnote 31] We also contacted the Public Record Office of Victoria, Australia; although this archive is not at the scale of a national archive, we included it because it has employed a unique technological approach to archiving electronic records. We performed our work from June 2001 to May 2002 in accordance with generally accepted government auditing standards. [End of section] Appendix II: Approaches to Archiving Electronic Records Provide Partial Solutions: The challenge of managing and preserving the vast and rapidly growing volumes of electronic records produced by modern organizations is placing pressure on archives and on the information industry to develop a cost-effective long-term preservation strategy that will free electronic records from the constraints of proprietary file formats and software and hardware dependencies. Part of this strategy will involve ways to capture and use information about the records to make them accessible, as information in card catalogs does in traditional libraries. After considerable research in this area, some agreement is being reached on the metadata (data about data) required for preserving electronic records, and some practical applications are using XML (Extensible Markup Language[Footnote 32]) for creating such metadata. However, there is no current solution to the electronic records archiving challenge, and so archival organizations now rely on a mixture of evolving approaches that generally fall short of solving the long-term preservation problem. The four most common approaches-- migration, emulation, encapsulation, and conversion--are in use or under consideration by the major archives. NARA is supporting the investigation of a new approach involving records conversion (known as persistent object preservation), but this has yet to mature. Recognizing that archival solutions may be some time off, companies in the information industry are relying on off-the-shelf technology for providing access to billions of electronic records. These commercial archives, however, concentrate on electronic records of types that are relatively uniform in comparison to those that a government archive must address. Archiving Requires Documentation of Attributes and Relationships of Records: Archives use catalogs of various types to capture information about records, information that is critical for sharing, storing, managing, and accessing records effectively--particularly in the context of millions of records. Because such information is data containing descriptive information about other data, it is referred to as metadata. Metadata are a central element of any approach to ensure that preserved records are functional. For electronic records, the metadata needed are often more extensive than information in traditional catalogs, including information that is important for preservation. Metadata Provide Information Necessary to Describe Electronic Collections: The creation of accessible software-and hardware-independent electronic records requires that all materials that are placed in archives be linked to information about their structure, context, and use history. Metadata to be associated with electronic records may include information about: * the source of the record; * how, why, and when it was created, updated, or changed; * its intended function or purpose; * how to open and read it; * terms of access, and: * how it is related to other software and records used by the originating organization. These metadata must be sufficient to support any changes made to records through various generations of hardware and software, to support the reconstruction of the decisionmaking process, to provide audit trails throughout a record’s life cycle, and to capture internal documentation. Without an adequately defined metadata structure, an effective electronic archive cannot be constructed. Numerous research projects have examined the question of defining metadata that would be sufficient to ensure digital preservation. Although archives experts note that unresolved issues remain, the work on preservation metadata is beginning to move from the research area to practice. The Public Record Office Victoria (Australia), a state archive, has published standards for the management of electronic records that includes a metadata model originally developed by the National Archives of Australia. For incorporating metadata, the Victoria archive mandates the use of XML. XML is being actively considered by archives and researchers as a promising approach to generating metadata. XML Enables Infrastructure-Independent Description of Electronic Records: XML is a flexible, nonproprietary set of standards for annotating (“tagging”) data with semantically rich labels that permit computers to process files on the basis of their meaning.[Footnote 33] Like the more familiar HTML (Hypertext Markup Language) files used on the World Wide Web, XML files can be easily transmitted via the Internet, and with appropriate software, they can be displayed by Web browsers. The difference is that HTML is used only for telling computers how to display information for a human being to view, whereas the semantically based XML tags allow computers to automatically interpret and process XML files. XML is called extensible because it is not a fixed format. Instead, XML is actually a “metalanguage”--a language for describing other languages--which allows the design of customized markup languages for limitless different types of documents. Thus, although in the beginning stages of adoption, XML is viewed as a promising format for a wide range of applications.[Footnote 34] Several XML attributes make it attractive for archive applications. The semantic nature of XML tags makes XML suitable for recording metadata. Its extensibility would allow archives to expand their systems to accommodate evolving needs. As an open standard, it reduces the problems of proprietary software. Further, because they are basically text files, XML files can be readily interpreted by disparate computer systems. Even without the mediation of software, human beings can interpret an XML-tagged file, because XML tags are human readable (see fig. 4). This quality allows them to be preserved both on computer media and on paper (so that they would be readable both by human beings and automatically through optical character recognition). Figure 4: Sample of XML Version of State Department Telegram: [See PDF for image] Source: San Diego Supercomputer Center. [End of figure] Figure 4 is an example of a text document--a World War II vintage telegram in the Franklin D. Roosevelt library--converted to XML format.[Footnote 35] The XML “tags” provide the means for identifying- -and retrieving--key pieces of information, such as date sent, addressee, and place of sender. If the file were viewed in an XML- compliant Web browser, the tags in the telegram would not be visible, and the telegram itself could be displayed in various ways for the convenience of the human reader. At the same time, the presence of the tags permits computer systems to perform powerful searches and exchange data. XML is also used by the National Archives of Australia,[Footnote 36] which converts files from their native formats to XML versions, while retaining a copy of the original source file. The Australian archives has also developed a metadata model, but it has not yet determined its final preservation metadata requirements. Electronic Archives Take Combinations of Approaches to Preservation: For long-term preservation of electronic records, electronic archives must address the problems of obsolescence and aging of storage media, the dependence of electronic records on the software and hardware on which they were created, the complexity of electronic records, and the massive volumes of records created by often decentralized systems. According to one archival expert, a viable strategy for long-term preservation for electronic records would call for “a long-lived solution that does not require continual heroic effort or repeated intervention of new approaches every time formats, software, or hardware paradigms, document types, or recordkeeping practices change.”[Footnote 37] Since no one solution is yet available that addresses all the problems, most archives and other institutions that preserve records use a variety of approaches, often in combination. The current approaches for dealing with the technical issues associated with long-term electronic archiving are: * technology preservation--maintaining old technologies to allow access to old formats; * emulation--using software running on new-technology platforms to mimic old technologies; * migration--transferring digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation;[Footnote 38] * encapsulation--grouping together a digital object with other information necessary to provide access to that object; and: * conversion to standard formats--transforming records into objects that are relatively software and hardware independent. The recent development of durable analog storage media (that is, media that preserve images of human-readable documents, much as microfiche does) suggests the possibility of approaches that combine those above with the use of analog rather than digital media.[Footnote 39] Technology Preservation Is a Short-Term Solution Only: Technology preservation refers to the practice of maintaining outdated equipment well after it is useful in everyday business processes. Under this approach, electronic files or records, which are saved in their native formats, continue to be accessible through the use of original hardware and software. In the short term, this is a simple and cost- effective approach, and some organizations do maintain older information systems only to be able to access their records.[Footnote 40] However, this approach is at best an interim solution to the problem of the dependence of electronic records on the software and hardware on which they were created. The solution eventually fails, because maintaining the original technology grows increasingly difficult and costly with the passage of time. Further, it does not solve the problem of aging and obsolescent storage media, which would also grow more difficult if not impossible to replace. Issues of cataloging and metadata are also not addressed by this approach. With the seemingly endless introduction of new hardware and software, the sheer number of differing formats and applications, and the cost to maintain any and all systems, technology preservation is not a feasible strategy for the long term. Emulation Is Currently More Theoretical Than Practical for Electronic Archiving: A proposed approach to the problem of software and hardware dependence is emulation, which aims to preserve the original software environment in which records were created. Emulation software mimics the functionality of older software (generally operating systems) and hardware. Under the emulation approach, data files are stored along with copies of the creating software as well as software that emulates the hardware/operating system required to run the software.[Footnote 41] This technique seeks to recreate a digital document’s original functionality, look, and feel by reproducing, on current computer systems, the behavior of the older system on which the document was created. In other words, an emulation strategy means that nothing is done to the original electronic file; rather, the original environment is recreated. Since the original file remains unaltered, emulation also offers a solution to the problem of preserving the original functionality and the “look and feel” of complex digital files. Emulation has been in practical use on computer systems for many years: * IBM mainframes emulate previous mainframes in order to support legacy systems and allow several generations of operating system versions to be run. * Operating system emulators allow a single computer to provide more than one operating environment (such as Macintosh and Windows). * Emulation software allows desktop computers to run video games and legacy video gaming systems. However, according to one archival expert, emulation has not yet been applied to preserving archival documents in any systematic way. Although emulation could in theory be part of a solution to the problem of hardware and software independence, it is just beginning to be explored as an archival approach. Emulation is under consideration as one of various archiving approaches by the United Kingdom’s Public Record Office.[Footnote 42] One problem unique to emulation is that intellectual property rights issues may be involved when either operating systems or applications are emulated.[Footnote 43] Even if the software and hardware are obsolete, their copyrighted specifications are not likely to be released for the benefit of archival integrity. Further, the use of an emulated operating system or application introduces outmoded programs into a modern environment, requiring users to understand how to use them; in other words, using the old software may require expert knowledge of the outdated systems--knowledge that is likely to disappear. Other problems with emulation include the increasing possibility that software failures will occur as the old systems continue to age and the pool of expertise concerning them shrinks. Emulation assumes that the emulated software will continue to run without maintenance. As the year 2000 date conversion problem showed, this is not a safe assumption, as it is possible that software may contain bugs that may eventually cause catastrophic loss of information.[Footnote 44] Further, an emulation approach depends on several components working together (the emulation software, the original application, and the data); as the number of components increases, so does the risk of failure. Migration of Both Media and File Formats May Preserve Records: Migration refers to the periodic transfer of digital materials from one format configuration to another, or from one generation of computer technology to a subsequent generation. In the context of archiving, migration can refer both to the media on which information resides (conversion from older to newer media or forms of media) and to the formats in which it is encoded (conversion from one file format or system to another). The first type of migration, media migration, has been so far unavoidable: it is the standard approach to the problem of media obsolescence and aging. In media migration, records are moved from older storage media to newer media, either to avoid the obsolescence or decay of an older medium or to upgrade to a more advanced medium (often to increase storage capacities while reducing cost). However, media migration alone does not ensure that the electronic records transferred to the new media continue to be accessible, especially if their format is obsolete. As new storage technologies evolve--including extreme- longevity analog media such as the High Density Rosetta disk discussed later in this appendix--the migration process may become less frequent and more efficient. The second type of migration, format migration, is a process of preservation by conversion: specifically, format migration is defined as rearranging the original sequence of structural and data elements of a file to conform to another configuration. Such migration occurs whenever older systems and formats are displaced by newer, often more advanced systems and formats. Many organizations have, for example, converted old database systems to newer systems, and in the process they have converted the formats of the records they contain. The major difficulty with format migration is the risk of altering records during conversion from the source to the target format. For conversions to be successful, those performing the transition must have knowledge of the original application and data formats,[Footnote 45] and the more complex the file structure, the more important this knowledge is. Whether the application is commercial or generated in house, over time this knowledge may be lost and with it the ability to perform a successful migration. For such reasons, migration has been described as cost effective only for certain types of records that remain in operational use.[Footnote 46] For records in use, problems with imperfect conversion are more likely to be discovered by users, and organizational resources are more likely to be devoted to ensuring that these are resolved or mitigated. Further, although format migration has occurred in many contexts in the past, it has not been extensively used in archiving. Most electronic archives are relatively new, so they are dealing with records in current formats created by systems that are still operational. Thus, they have not yet experienced the need to incorporate format migration into their processes. Rather, they treat migration as a future option for dealing with preserving the types of records that they are currently storing. As a strategy for the long-term preservation of electronic records, relying on format migration is risky. Migration as a preservation strategy would have to be a continuous process, with conversions occurring whenever a new format needed to be introduced. With each format conversion, the possibility of loss would be increased, and the more complex the record, the more the possibility of loss. Thus, migration is at best an imperfect solution as it can potentially lead to the loss of record integrity. Migration was selected by the United Kingdom’s Public Record Office as its current archival approach. In addition to migration, the Public Records Office is also considering using emulators and viewers to access archived files in their native formats. Encapsulation Preserves Both Records and Information about Records: Encapsulation is the combining of several elements to create a new single entity; in the context of archiving, the elements would be the records themselves, metadata identifying and describing the records, and possibly other elements (such as viewers enabling the records to be read).[Footnote 47] Unlike migration, encapsulation does not necessarily involve a change in the original file format. If the format is unchanged, encapsulation would avoid the problem of loss of integrity that migration entails. Leaving records in their native formats would leave open the possibility of processing the objects with the original software, and it would also permit subsequent transformation of the encapsulated records using methods that were not available when the records were originally placed into the archives.[Footnote 48] Encapsulation is currently being used by the Victoria Public Records Office in Australia.[Footnote 49] The Victoria archive uses XML to encapsulate records along with standardized metadata describing each record in a Victorian Electronic Record Strategy (VERS) format.[Footnote 50] The VERS format mandates the use of XML to describe and encapsulate records. However, the Victoria archive has only recently begun applying its process, and its electronics records collection is as yet small (described as “a few records”), so it is premature to judge its effectiveness for large-scale, long-term preservation. Conversion to Standard Formats Makes Records Less Dependent on Hardware and Software: Conversion transforms records into standard text formats such as ASCII[Footnote 51] or XML to increase their independence from hardware and software. This approach is currently used by the National Archives of Canada[Footnote 52] and by NARA (both of which accept databases in ASCII format), as well as the National Archives of Australia,[Footnote 53] which converts files from their native formats to XML, while retaining a copy of the original source file. The Victoria archives is using a combination of conversion and encapsulation in its preservation approach, because before encapsulating selected types of documents, it is requiring their conversion (where appropriate) to Adobe Systems’ Portable Document Format (PDF). PDF is a compact format that preserves all the fonts, formatting, graphics, and color of any source document, regardless of the software and hardware used to create it. Although PDF is a proprietary file format, PDF files can be shared, viewed, navigated, and printed exactly as intended by anyone with the freely distributed Adobe Acrobat Reader. The primary shortcomings of the conversion approach are the limitations and the longevity of the selected standard.[Footnote 54] For example, converting databases to ASCII format limits their usefulness: the conversion of a relational database to flat ASCII database tables will eliminate the embedded information about the relationships among data elements.[Footnote 55] Conversion to XML, on the other hand, may involve fewer such limitations, but it depends on the XML standard remaining in use and accessible. NARA is investigating an advanced form of conversion combined with encapsulation known as persistent object preservation (POP). Under this approach, records are converted by XML tagging and then encapsulated with metadata. According to NARA, the persistent object transformation approach would make electronic records self-describing in a way that is independent of specific hardware and software. The architecture for POP is being developed through the National Partnership for Advanced Computational Infrastructure. The partnership is a collaboration of 46 institutions nationwide (including NARA) and 6 foreign affiliates, with the San Diego Supercomputer Center serving as the technical resource. According to NARA, persistent object preservation would accommodate preservation of persistent but evolving collections by providing the ability to dynamically reconstruct data collections on new technology. The result would be a system that could upgrade individual technical components and migrate media while safeguarding the archived records. POP would thus not only enable the use of future, advanced technologies, it would also reduce threats to integrity and authenticity, because POP would not require changes in the preserved data. However, POP may not be sufficiently mature to be translated into system design. Migration to Durable Analog Media May Offer Hybrid Approach: An archive that stores records digitally must use media migration as a preventive measure to avoid decay and obsolescence. However, the use of analog storage offers a possible alternative that may diminish the need for media migration. Whereas all current media now record digital information as 0’s and 1’s, analog storage of documents is suggested by a new product, called a High Density Rosetta, developed by Norsam Technologies (see fig. 5). Figure 5: The Long Now Foundation Rosetta Disk Language Archive: [See PDF for image] Source: Rolfe Horn, courtesy of the Long Now Foundation. [End of figure] The nickel-plated disk, which has a life expectancy that is orders of magnitude longer than current electronic media,[Footnote 56] allows the analog storage of information and images that are readable via an electron or optical microscope. Such a medium could avoid the obsolescence created by software-reliant media. The plates are physically inscribed by an ion beam, through a process known as ion milling.[Footnote 57] This medium can store on each side of its 2-inch plate over 196,000 pages (with electron microscope retrieval) or 5,000 to 18,000 pages (with optical microscope retrieval). Using a text-based coding system such as XML would permit both coded (software readable) and image (human readable) information to be stored on this long-lived medium. The migration issue would then arise if new software were to be adopted, but the image information would persist. The High Density Rosetta is being used by the Long Now Foundation to create an extreme-longevity archive of selected languages.[Footnote 58] According to the foundation, 50 to 90 percent of the world’s languages are predicted to disappear in the next century, many with little or no significant documentation. As part of the effort to secure this critical legacy of linguistic diversity, the foundation initiated the Rosetta Project,[Footnote 59] an effort to develop a contemporary version of the historic Rosetta Stone. The project’s goal is the development of a permanent archive of 1,000 languages. For storage of this archive, the project is using the High Density Rosetta to micro- etch text of archived languages at a scale readable by a 1,000-power optical microscope. Information Technology Industry Relies on Off-the-Shelf Technologies to Provide Access to Electronic Collections: While government and academic institutions are searching for a permanent solution to electronic records archiving problems, the private sector, also concerned about and affected by the potential loss of electronic records, relies on existing information architectures and off-the-shelf technologies to make accessible massive volumes of electronic records dating back over two decades. These archiving achievements do not meet the rigorous requirements for permanence and authenticity that are demanded by a government archive, nor are their owners required to process, store, and access the full range of complex file formats encountered by governments. However, they do illustrate the capability to provide storage and access to large quantities of data. Two of the most notable private sector efforts are the Internet Archives and the Google archive of Usenet messages. Internet Archives: The Internet Archives has created a digital library of Internet sites and other born-digital cultural artifacts. It is attempting to archive the entire publicly available Web, offering free access to researchers, historians, scholars, and the general public. Anyone with access to the Internet can, through the Internet Archives Web site,[Footnote 60] navigate the Web at any moment in time from 1996 to the present. This collection of Web pages contains over 100 terabytes, or 10 billion Web pages, and it is currently growing at a rate of 12 terabytes per month. The stored and accessible 100 terabytes is larger than the amount of data contained in the world’s largest libraries, including the Library of Congress, making it the largest known database in existence. Without the efforts of the Internet Archives, these 10 billion Web pages might have been lost. As it is, they provide a record of the origins and evolution of the Internet, as well as a reflection of societal interests and opinions at different moments in time. This is particularly true in the case of Web sites such as those of presidential candidates (see fig. 6) and of monumental events such as the September 11 attacks, both of which have prominence on the Internet Archives Web site as “Special Wayback Collections.”: Figure 6: Internet Archive Collection of Presidential Candidate Web Sites: [See PDF for image] Source: Internet Archives. [End of figure] According to the Internet Archives, it has achieved inexpensive storage on a major scale: it uses off-the-shelf technology at a cost of about $4,000 per terabyte. As a preservation strategy, the Internet Archives currently uses media migration to avoid media obsolescence and take advantage of technological advances to reduce costs. As a safety measure, backup copies of a part of the collection are also created. Google: Google claims to have the largest index of Web sites available on the World Wide Web and the industry’s most advanced search technology. Google’s Web site also contains an archive of Usenet messages that cover the past 20 years (see fig. 7).[Footnote 61] Usenet is a collection of text messages that are posted on Internet electronic bulletin boards. These bulletin boards--which existed before E-mail, Web browsers, and the Web itself--provide avenues for communication in an open forum, allowing others to read and reply. Some notable “posts” included in Google’s Usenet Archives are the first post mentioning Microsoft (1981), the first post mentioning a compact disc (1982), and the posts sent just after the September 11 attacks. Figure 7: Google’s Usenet Archive: [See PDF for image] Source: Google. [End of figure] Google currently provides access to more than 700 million messages dating back to 1981, and this number is rapidly increasing. Google’s collection is by far the most complete collection of Usenet articles ever assembled. Before Google’s acquisition of the archive, posts without activity were usually deleted from the live discussion forums after a few days or weeks, and therefore they were not viewable or searchable by users. Some feel that Google’s Usenet archive is an irreplaceable and invaluable reference, representing “the human side of the Internet” through first-hand accounts of historical events. [End of section] Appendix III: NARA’s Electronic Records Guidance Has Evolved: A review of the development of electronic records guidance issued by the National Archives and Records Administration (NARA) over the last several decades demonstrates the extent to which the rapid evolution of information technology has posed significant challenges for NARA in its role of providing guidance to federal agencies concerning the management of electronic records under the Federal Records Act.[Footnote 62] NARA provides guidance for electronic records management and disposition largely through two sets of guidance: * the electronic records management regulation, which provides general responsibilities for agency management of electronic records;[Footnote 63] and: * the general record schedules, which provide disposal authorization for specific categories of temporary records common to most agencies.[Footnote 64] The history of these two sets of guidance reflects the evolution of NARA’s electronic records guidance. Electronic records management was given a formal role in 1968 when NARA, then the National Archives and Records Service (NARS) of the General Services Administration (GSA), established a unit to develop policies for selecting and preserving electronic records. This Data Archives Staff undertook to develop three sets of guidance: (1) inventory guidance--forms for inventorying magnetic tape files; (2) environmental guidance--recommendations for proper handling and storage of magnetic tape; and (3) GRS 20--a general records schedule for computerized records. Of that guidance, GRS 20 emerged as NARA’s first significant electronic records guidance. It was intended to cover electronic records created by mainframe applications in the then-dominant agency data processing operations. The major purpose was to address the efficient disposition of those electronic records, including destruction of unneeded temporary records and transfer to NARS (NARA) of permanent records. The 1972 GRS 20, entitled Data Automation Program Records, stated, “This schedule covers machine readable records, related documentation required for their servicing, and files related to the automatic data processing (ADP) procurement, operations, and management functions.” GRS 20 divided these records into categories that “correspond roughly to the typical organizational and functional structure found in most ADP installations and their parent organizations.”[Footnote 65] According to recent NARA summaries, the 1972 GRS 20 was meant “to provide disposal authority for specific categories of temporary records associated with mainframe applications. Excluded from its coverage, and all subsequent revisions, were the types of records generated by large data systems that might have archival value.”[Footnote 66] The clear meaning of the 1972 GRS 20, however, was that it was not meant merely to identify and provide for efficient disposal of “ancillary materials common to most data processing operations.”[Footnote 67] Quite the contrary, the guidance identified a range of records that should be scheduled through filing of a Standard Form 115. These ranged from various temporary records to potentially permanent records, such as master data files. GRS 20 was revised in 1977.[Footnote 68] While the 1977 revision restructured the 1972 electronic records categories, it retained the earlier purpose of providing disposition instructions for virtually all records associated with data processing operation