This is the accessible text file for GAO report number GAO-02-327 
entitled 'Electronic Government: Challenges to Effective Adoption of 
the Extensible Markup Language' which was released on April 5, 2002. 

This text file was formatted by the U.S. General Accounting Office 
(GAO) to be accessible to users with visual impairments, as part of a 
longer term project to improve GAO products’ accessibility. Every 
attempt has been made to maintain the structural and data integrity of 
the original printed product. Accessibility features, such as text 
descriptions of tables, consecutively numbered footnotes placed at the 
end of the file, and the text of agency comment letters, are provided 
but may not exactly duplicate the presentation or format of the 
printed version. The portable document format (PDF) file is an exact 
electronic replica of the printed version. We welcome your feedback. 
Please E-mail your comments regarding the contents or accessibility 
features of this document to Webmaster@gao.gov. 

United States General Accounting Office: 
GAO: 

Report to the Chairman, Committee on Governmental Affairs, U.S. Senate. 

Electronic Government: 

Challenges to Effective Adoption of the Extensible Markup Language: 

GAO-02-327: 

Contents: 

Letter: 

Executive Summary: 

Purpose: 

Background: 

Results in Brief: 

Principal Findings: 

Recommendations for Executive Action: 

Agency Comments and Our Evaluation: 

Chapter 1: Background: Features and Current Federal Use of XML: 
Standardized Data Tagging Facilitates Information Exchange among 
Disparate Systems: 
XML Supports Internet-Based Data Exchange: 
XML's Technical Standards Provide the Tools to Describe and
Exchange Data over the Internet: 
XML Was Designed to Accommodate Numerous Extensions: 
XML Can Enhance Information Search, Retrieval, and Analysis: 
XML Usage Complements Traditional Electronic Data Interchange
Applications: 
Federal XML Projects Vary in Size and Scope: 
Objectives, Scope, and Methodology: 

Chapter 2: A Comprehensive Set of Standards for Implementing XML Is 
Only Partially in Place: 
XML Technical Standards Have Largely Been Defined: 
Additional Standards Have Been Proposed for Using XML to Conduct 
Electronic Business: 
Business Process Standards Are Less Well-Developed than Technical 
Standards: 
Potentially Useful XML Vocabularies Are Not Ready for Governmentwide 
Adoption: 

Chapter 3: The Federal Government Faces Challenges in Realizing XML's 
Full Potential: 
Implementing XML Presents Pitfalls: 
Governmentwide Actions to Promote XML Adoption Have Focused on 
Education and Outreach: 
Federal Government Needs Have Not Been Consolidated for Input to 
Standards-Setting Bodies: 
XML Interoperability across the Government Depends on an Effective 
Cross-Agency Registry: 
XML Implementations Can Be More Effective within the Context of an 
Enterprise Architecture: 

Chapter 4: Conclusions and Recommendations: 
Conclusions: 
Recommendations for Executive Action: 
Agency Comments and Our Evaluation: 

Appendix I: Comments from the National Aeronautics and Space 
Administration: 

Appendix II: Comments from the National Archives and Records 
Administration: 

Glossary: 

Tables: 

Table 1: Comparison of HTML and XML: 

Table 2: Basic XML Components: 

Table 3: Comparison of EDI and XML: 

Table 4: XML Technical Standards as of February 2002: 

Table 5: Representative Industry-Specific XML Vocabularies: 

Table 6: Strengths and Pitfalls of XML: 

Figures: 

Figure 1: A Hypothetical XML-Based State Driver's License System: 

Figure 2: XML Code Example: 

Figure 3: XML Can Facilitate the Use of Different User Interfaces and 
Display Devices: 

Figure 4: A "Request for Quotation" Formatted as an EDI Message: 

Figure 5: Typical Flow of Business Transactions Based on EDI Standards: 

Figure 6: Representative ebXML Transaction: 

Figure 7: Using a Registry of XML Data Elements and Structures: 

Abbreviations: 

ANSI: American National Standards Institute: 

CIO: chief information officer: 

DISA: Defense Information Systems Agency: 

DOD: Department of Defense: 

DTD: document type definition: 

ebXML: electronic business XML: 

EDGAR: Electronic Data Gathering, Analysis, and Retrieval: 

EDI: Electronic Data Interchange: 

EPA: Environmental Protection Agency: 

GAO: General Accounting Office: 

GSA: General Services Administration: 

HTML: Hypertext Markup Language: 

IT: information technology: 

NIST: National Institute of Standards and Technology: 

OASIS: Organization for the Advancement of Structured Information 
Standards: 

OMB: Office of Management and Budget: 

SEC: Securities and Exchange Commission: 

UDDI: Universal Description, Discovery, and Integration: 

United States General Accounting Office: 
Washington, DC 20548: 

April 5, 2002: 

The Honorable Joseph I. Lieberman: 
Chairman: 
Committee on Governmental Affairs: 
United States Senate: 

Dear Mr. Chairman: 

This report responds to your request that we review the status of 
Extensible Markup Language (XML) technology and the challenges the 
federal government faces in implementing it. XML is a flexible, 
nonproprietary set of standards designed to facilitate the exchange of 
information among disparate computer systems, using the Internet's 
protocols. Specifically, we agreed to assess (1) the overall 
development status of XML standards to determine whether they are 
ready for governmentwide use and (2) challenges faced by the federal 
government in optimizing its adoption of XML technology to promote 
broad information sharing and systems interoperability. The report 
recommends that the director of the Office of Management and Budget 
(OMB) take steps to improve the federal government's planning for 
adoption of XML. 

As agreed with your office, unless you publicly announce the contents 
of this report earlier, we plan no further distribution until 30 days 
from the report date. At that time, we will send copies of this report 
to the ranking minority member, Committee on Governmental Affairs, and 
interested congressional committees. We will also send copies to the 
director of OMB. Copies will be made available to others upon request. 
The report will also be available on our home page [hyperlink, 
http://www.gao.gov]. 

If you have any questions concerning this report, please call me at 
(202) 512-6257 or send e-mail to mcclured@gao.gov. Other major 
contributors included Barbara S. Collier, John de Ferrari, Chetna Lal, 
Steven Law, Anh Le, John C. Martin, and Mark D. Shaw. 

Sincerely yours, 

Signed by: 

David L. McClure: 
Director, Information Technology Management Issues: 

[End of section] 

Executive Summary: 

Purpose	The Extensible Markup Language (XML) is a flexible, 
nonproprietary set of standards for annotating or "tagging" 
information so that it can be transmitted over a network such as the 
Internet and readily interpreted by disparate computer systems. 
[Footnote 1] It is increasingly being promoted by information 
technology (IT) developers as the basis for making computerized data 
much more broadly accessible and usable than has previously been 
possible. As a result, many organizations, including both private 
businesses and federal government agencies, are building applications 
that try to take advantage of XML's unique features. Given the 
widespread interest in adopting this new technology, the chairman of 
the Senate Committee on Governmental Affairs asked GAO to assess (1) 
the overall development status of XML standards to determine whether 
they are ready for governmentwide use and (2) challenges faced by the 
federal government in optimizing its adoption of XML technology to 
promote broad information sharing and systems 
interoperability.[Footnote 2] 

Background: 

Advances in the use of IT—especially the rise of the Internet—are 
changing the way private sector businesses, government agencies, and 
other organizations communicate, exchange information, and conduct 
business among themselves and with the public. The Internet offers the 
opportunity for a much broader and more immediate exchange of 
information than was previously possible, because it provides a 
virtually universal communications link to a multitude of disparate 
systems. However, although the Internet can facilitate the exchange of 
information, much of the information displayed to users is delivered 
only as a stream of computer code to be visually displayed by Web 
browsers, such as Internet Explorer or Netscape Communicator. For 
example, an economist might visit a Web page that displayed 
statistical information about the production of various agricultural 
commodities over a number of years. Typically, such a Web page would 
only display this information to the economist to examine visually on 
his or her computer screen. Without special translation software, it 
would likely be difficult for the economist to transfer the 
information to a separate computer program for further statistical 
analyses. 

An agreed-upon standard for labeling or "tagging" each element of the 
computerized data set could facilitate the automatic identification 
and processing of such information. For example, the economist's Web 
page would likely display many numbers representing specific pieces of 
information. The number "2,400,000.00" might appear, representing the 
value of soybeans produced in a given place at a given time. Even if 
the economist's computer had been programmed to analyze agricultural 
cost data, it would not be able to recognize that "2,400,000.00" 
referred to a specific value for soybeans at a given place and time, 
unless the number were tagged with that descriptive information in a 
format the computer system understood. Tagging data according to 
standard formats and definitions would allow systems that recognize 
those standards to readily understand and process the data. 

Currently, the XML set of standards is generally considered to be a 
primary candidate for filling the role of an Internet family of 
standards for tagging data. If implemented broadly and consistently, 
XML offers the promise of making it significantly easier for 
organizations and individuals to identify, integrate, and process 
complex information that may initially be widely dispersed among 
systems and organizations. For example, law enforcement agencies could 
potentially better identify and retrieve information about criminal 
suspects from any number of federal, state, and local databases. 
Further, XML could also make it easier to conduct business 
transactions over the Internet, because it offers a standard way to 
label and package the information that needs to be exchanged to 
conduct electronic business. 

Rather than a single specification, XML is a collection of related 
standards. Two types of standards are essential for effective use of 
XML across organizations in either the public or private sector: (1) 
technical standards, which define the basic rules for tagging, 
structuring, and displaying information; and (2) business standards, 
which provide the vocabulary and protocols for conducting business 
electronically. The core XML standard was designed to accommodate a 
wide variety of supplemental standards, or extensions, to address 
additional functions and meet specialized needs. 

XML is not the first attempt by IT developers—or the federal 
government—to standardize the process of data exchange. Much effort, 
for example, was spent over many years to develop the Electronic Data 
Interchange (EDI) standards, which remain in use today and are 
expected to continue in use alongside XML. However, EDI use has been 
largely limited to data exchanges among large organizations, because
implementing EDI generally entails buying customized proprietary 
software and setting up expensive, private communications networks. 
XML has the potential for broader implementation because it requires 
less customization and uses the Internet's data communications 
infrastructure, which is already in place. 

Federal XML projects undertaken to date have varied significantly in 
size and scope. In many cases, agencies have used XML to enhance data 
exchange within well-defined communities of interest with well-defined 
data exchange requirements. In addition, several larger agencies have 
been making efforts to define XML-related data standards for larger
communities of interest. For example, the Environmental Protection 
Agency has been working with state environmental agencies to develop 
XML data standards for a national network of environmental information. 

Results in Brief: 

While XML's technical standards—such as specifications for tagging, 
exchanging, and displaying information—have largely been worked out by 
commercial standards-setting organizations and are already in use, 
equally important business standards are not as mature and may 
complicate near-term implementation. For example, standards are not 
yet complete for (1) identifying potential business partners for 
transactions, (2) exchanging precise technical information about the 
nature of proposed transactions so that the partners can agree to 
them, and (3) executing agreed-upon transactions in a formal, legally 
binding manner. Many standards-setting organizations in the private 
sector are creating various XML business standards, and it will be 
important for the federal government to adopt those that achieve 
widespread acceptance. However, it is not yet clear which business 
standards meet this criterion. In addition, key XML vocabularies 
tailored to address specific industries and business activities are 
still in development and not yet ready for governmentwide adoption.
Given that a complete set of XML-related standards is not yet 
available, system developers must be wary of several pitfalls 
associated with implementing XML that could limit its potential to 
facilitate broad information exchange or adversely affect 
interoperability, including (1) the risk that redundant data 
definitions, vocabularies, and structures will proliferate, (2) the 
potential for proprietary extensions to be built that would defeat 
XML's goal of broad interoperability, and (3) the need to maintain 
adequate security. In addition to these pitfalls, which all systems 
developers must address, the federal government faces additional 
challenges as it attempts to gain the most from XML's potential. 
Specifically: 

* No explicit governmentwide strategy for XML adoption has been 
defined to guide agency implementation efforts and ensure that agency 
enterprise architectures address incorporation of XML. Although 
agencies need flexibility to tailor XML-based systems to meet their 
unique needs, they risk building and buying systems that will not work 
with each other in the future if their efforts do not take place 
within the context of a well-defined strategy. 

* The needs of federal agencies have not been uniformly identified and 
consolidated so that they can be represented effectively before key 
standards-setting bodies. It will be important for the federal 
government to leverage and build upon commercially developed standards 
and XML vocabularies as they become mature and widely accepted. If 
federal requirements are not better understood and consolidated, the 
government may be unable to effectively provide input to these 
standards while they are still under development. 

* The government has not yet established a registry of government-
unique XML data structures (such as data element tags and associated 
data definitions) that system developers can consult when building or 
modifying XML-based systems. Without such a registry, developers are 
less likely to build systems using compatible data definitions, which 
would likely defeat the goal of broad data access and exchange. In 
order to establish such a registry, policies and procedures for adding 
tag definitions and maintaining the system would also be needed and 
have not yet been developed. 

* Much also needs to be done to ensure that agencies address XML 
implementation through enterprise architectures so that they can 
maximize XML's benefits and forestall costly future reworking of their 
systems. 

To address these challenges, GAO is making recommendations to the 
director, Office of Management and Budget (OMB), to enhance federal 
planning for adoption of XML. 

Principal Findings: 

A Complete Set of Standards for Implementing XML Is Only Partially in 
Place: 

Key technical standards for XML have been largely worked out under the 
auspices of the World Wide Web Consortium (W3C).[Footnote 3] These 
technical standards are focused on providing the generic structure and 
tools to tag data, transmit it over the Internet, and allow it to be 
processed by the computer systems that receive it. 

Business standards, though equally important, are generally less well-
developed, and reaching agreement on them is proving to be difficult 
when multiple communities of interest are involved. Business standards 
are needed to provide a more complete framework for conducting 
business over the Internet, including advertising products and 
services so that potential buyers and sellers can find each other, 
proposing and agreeing upon electronic transactions, and executing the 
agreed-upon transactions. Business standards are also needed to define 
vocabularies for the specific data elements that are to be exchanged 
when these transactions are conducted. 

Unlike XML technical standards, which are all established and 
maintained by the W3C, business standards are developed by a variety 
of public and private sector organizations, including industry 
consortia, and are not always universally supported. For example, a 
number of different approaches to addressing the process of conducting 
business transactions have been proposed, including electronic 
business XML (ebXML), RosettaNet, and XML-based Web services. These 
different approaches continue to vie for support and offer 
functionality that is in part overlapping and incompatible. Because 
uncertainty remains about which business standards will ultimately 
prevail, applications based on any of the current proposals may be at 
risk of being incompatible with future standards. In addition, without 
universally accepted standards, commercial IT vendors may be using XML 
extensions that are nonstandard and divergent and that may limit 
interoperability. 

In industries and professions where needs are well-defined and 
cohesive communities of interest exist, standard data vocabularies 
have been successfully developed. For example, mathematicians have 
created an XML vocabulary called the Mathematical Markup Language that 
allows them to insert equations into Web pages that can then be copied 
into specialized software applications and immediately used for 
calculations. Some of these vocabularies, once fully developed, may be 
useful to the government as well. However, many of these potentially 
useful standard vocabularies are still in the initial stages of 
development and do not provide all the data structures needed to 
support current needs. Using them at this time would mean taking the 
risk that future developments could diverge from these early standards 
and limit interoperability with them. As a result, they are not yet 
ready for governmentwide adoption. 

The Federal Government Faces Challenges in Realizing XML's Full 
Potential: 

Although XML offers the potential to greatly facilitate the 
identification, integration, and processing of complex information—
both within the federal government and externally—system developers 
face a number of pitfalls in implementing the technology. One risk is 
that markup languages, data definitions, and data structures will 
proliferate. If organizations develop their systems using unique, 
nonstandard data definitions and structures, they will be unable to 
share their data externally without providing additional instructions 
to translate data structures from one organization and system to 
another, thus defeating one of XML's major benefits. Likewise, 
software vendors and system developers may be tempted to add 
proprietary extensions to the XML standards when they build specific 
systems. Such systems might then be less able to freely exchange 
information with other XML-enabled systems. In addition, implementing 
XML in an organization could create new security vulnerabilities if 
steps are not taken in designing the system to mitigate this risk. 

In addition to these pitfalls, which all systems developers must 
address, the federal government faces additional challenges as it 
attempts to gain the most from XML's potential. Specifically: 

To date, neither OMB, which is responsible for developing and 
overseeing governmentwide policies and guidelines for agency IT 
management, nor the National Institute of Standards and Technology 
(NIST), which is responsible for developing federal information 
processing standards and guidelines, have formulated an explicit 
governmentwide strategy for XML adoption to guide agency 
implementation efforts and ensure that agency enterprise architectures 
address incorporation of XML. Activities within the federal government 
to promote broad governmentwide adoption of XML technology have been 
limited. Most governmentwide coordination has been limited to the 
activities of the XML Working Group, chartered by the federal Chief 
Information Officers (CIO) Council. The working group's activities 
have focused on education and outreach rather than developing a 
strategy for adopting XML. Without agreement on a governmentwide
implementation strategy, agencies risk building and buying systems 
that will not work with each other in the future. 

The federal government as a whole has neither identified cross-agency 
and governmentwide requirements for XML nor developed a dictionary of 
inherently governmental data tags and definitions. Further, no process 
has been defined for consolidated collaboration with commercial 
standards bodies to ensure that government requirements are identified 
and incorporated. Past experience coordinating federal requirements 
for EDI suggests that an effective approach is to task a central 
committee with collecting requirements from federal agencies and 
representing the government on key standards groups. 

Given that it is challenging to agree upon predefined XML 
vocabularies, other approaches can be adopted to encourage broad, 
consistent use of data definitions and structures. Specifically, a 
"bottom up" approach is to establish a centralized registry of key XML 
data elements and structures and coordinate its use by XML systems 
developers. With this arrangement, developers have the incentive to 
reuse data structures found in the registry because doing so reduces 
costs and brings about interoperability with other existing systems. 
The federal XML Working Group, chartered by the CIO Council, is 
working to create a pilot version of a governmentwide registry, based 
on a registry previously developed by the Defense Logistics Agency. 
However, further work will be needed to set policies and guidelines to 
ensure the effectiveness of the registry in promoting governmentwide 
systems interoperability. 

Another avenue for promoting interoperability is to ensure that sound 
XML implementation strategies are adopted and documented on an agency-
by-agency basis through development of enterprise architectures. 
Effective XML implementation depends on complete and well-established 
data definitions and structures, which can be best obtained through 
the process of defining and adopting an enterprise architecture. Such 
an architecture provides the foundation for maximizing XML's benefits 
and forestalling costly future reworking of agency systems.
If these challenges are not addressed, the use of XML in the federal 
government may have only limited benefits and may not achieve the 
technology's promise of facilitating broad interoperability among 
disparate systems. 

Recommendations for Executive Action: 

Given the statutory responsibility of OMB to develop and oversee 
governmentwide policies and guidelines for agency IT management, we 
recommend that the director of OMB, working in concert with the 
federal CIO Council and NIST, develop a strategy for governmentwide 
adoption of XML to guide agency implementation efforts and ensure that 
the technology is addressed in agency enterprise architectures. This 
strategy should, at a minimum, address how the federal government will 
address the following tasks: 

* Developing a process with defined roles, responsibilities, and 
accountability for identifying and coordinating government-unique 
requirements and presenting consolidated, focused input to private 
sector standards-setting bodies during the development of XML 
standards. This process could be patterned after the current process 
that is in place for EDI coordination among federal agencies, or OMB 
might consider adapting the EDI process to cover XML as well. Guiding 
the overall process should be the presumption that mature, agreed-upon 
commercial standards will be adopted by the government whenever 
possible. 

* Developing a project plan for transitioning the CIO Council's pilot 
XML registry effort into an operational governmentwide resource. This 
plan should include identifying time frames and resources needed to 
implement and maintain an operational registry linked to agency 
repositories of standard data structures. 

* Setting policies and guidelines for managing and participating in 
the governmentwide XML registry, once it is operational, to ensure its 
effectiveness in promoting data sharing capabilities among federal 
agencies. These policies should clarify the roles and responsibilities 
of specific agencies and should consider including definitions of 
classes of compliance, which could be used to categorize how 
rigorously organizations adhere to the policies. Further, these 
policies should promote the consistent use of XML namespaces to 
resolve potential ambiguity in data references across XML documents. 

In addition, as part of its ongoing process for reviewing agency IT 
architectures and annual budget requests, we recommend that OMB ensure 
that agencies' business needs for XML technology are defined in their 
enterprise architectures. Specifically, OMB should specify 
requirements for documenting the usage of XML standards and products 
in the standards profile section of the architecture—the section that 
defines the set of rules governing systems implementation and 
operation. 

Agency Comments and Our Evaluation: 

In oral comments on a draft of this report, officials from OMB's 
Office of Information and Regulatory Affairs, including the 
Information Policy and Technology Branch chief, generally agreed with 
our findings and conclusions and stated that they would consider our 
recommendations. The officials also provided information on recent OMB 
actions aimed at promoting the adoption of XML by federal agencies. We 
have incorporated this updated information in the report. We view 
these recent OMB actions as positive steps. Nevertheless, we also 
believe that OMB can improve on these actions by implementing the 
recommendations in this report. 

We received oral comments from the co-chairmen of the XML Working 
Group; officials of NIST's Information Technology Laboratory; and the 
deputy associate administrator, Office of Electronic Commerce, General 
Services Administration. We also received written comments from the 
chief information officer, National Aeronautics and Space 
Administration; and the director for policy and communications staff, 
National Archives and Records Administration. Letters from these 
latter two agencies are reprinted in appendixes I and II. All of the 
agency officials who reviewed the draft agreed with the overall 
content of the report. Officials from the XML Working Group and the 
National Archives and Records Administration expressed concern that 
the draft overemphasized the value of a "top down" XML implementation 
strategy that emphasizes executive direction and guidance as opposed 
to a "bottom up" approach relying on individual initiative at lower 
management levels. We believe that it is important to strike a balance 
between the two approaches. In response to this concern, we are 
including language in the final report to emphasize that a balance 
between the bottom up and top down approaches is needed. In addition, 
each agency provided technical comments, which have been addressed 
where appropriate in the final report. 

[End of section] 

Chapter 1: Background: Features and Current Federal Use of XML: 

Advances in the use of information technology (IT)—especially the rise 
of the Internet—are changing the way organizations communicate, 
exchange information, and conduct business among themselves and with 
the public. The Internet offers the opportunity for a much broader 
exchange of information than was previously possible, because it 
provides a virtually universal communications link to the multitude of 
disparate systems operated by private sector businesses, government 
agencies, and other organizations. 

However, although the Internet can facilitate the exchange of 
information, much of the information displayed to users is delivered 
only as a stream of computer code to be visually displayed by Web 
browsers, such as Internet Explorer or Netscape Communicator. Without 
human intervention, such information cannot be extracted and reused 
for other purposes. For example, an economist might visit a Web page 
that displayed statistical information about the production of various 
agricultural commodities over a number of years. Typically, such a Web 
page would only display this information to the economist to examine 
visually on his or her computer screen. Without special translation 
software, it would likely be difficult for the economist to transfer 
the information to a separate computer program for further statistical 
analyses. 

An agreed-upon standard for annotating or "tagging" each element of 
the computerized data set could facilitate the automatic 
identification and processing of such information. For example, the 
economist's Web page would likely display many numbers representing 
specific pieces of information. The number "2,400,000.00" might 
appear, representing the value of soybeans produced in a given place 
at a given time. Even if the computer system had been programmed to 
analyze agricultural cost data, it would not be able to recognize that 
"2,400,000.00" referred to a specific value for soybeans at a given 
place and time, unless the number were tagged with that descriptive 
information in a format that the computer system understood.
Tagging data in a standard way allows any system that recognizes the 
standard to readily understand and process data that conforms to that 
standard. In tagging, a standard format is used to label each element 
of a data set with metadata[Footnote 4] that clarifies what kind of 
information is being Chapter 1: Background: Features and Current 
Federal Use of XML provided. Common tagging systems for electronic 
information—also known as markup languages—use labels set off by 
angled brackets to show where data elements begin and end: for 
example, in < label > data </label>, the second tag 
includes a slash to indicate that it is a closing tag. 

The Extensible Markup Language (XML) is a flexible, nonproprietary set 
of standards for tagging information so that it can be transmitted 
over a network such as the Internet and readily interpreted by 
disparate computer systems. If implemented broadly with consistent 
data definitions and structures, XML offers the promise of making it 
significantly easier for organizations and individuals to (1) 
identify, integrate, and process information that may initially be 
widely dispersed among systems and organizations, and (2) conduct 
transactions based on exchanging and processing such information-—a 
key element for federal agencies positioning themselves to provide 
electronic government services to citizens and businesses. 

In a previous attempt to standardize the process of data exchange, 
much effort was spent over many years to develop Electronic Data 
Interchange (EDI) standards, which are in use today and will probably 
continue to be used alongside XML. However, their use has been largely 
limited to data exchanges among large businesses and government 
agencies, because implementing EDI generally entails buying customized 
proprietary software and setting up expensive, private communications 
networks. XML has the potential for broader implementation because it 
was designed to take advantage of the Internet's capabilities and 
protocols, which are already in place. 

Federal XML projects undertaken to date have varied significantly in 
size and scope. In many cases, agencies have used XML to enhance data 
exchange within well-defined communities of interest with well-defined 
data exchange requirements. In addition, several larger agencies have 
been making efforts to define XML-related data standards for larger
communities of interest. For example, the Environmental Protection 
Agency (EPA) has been working with state environmental agencies to 
develop XML data standards for a national network of environmental 
information. 

Standardized Data Tagging Facilitates Information Exchange among 
Disparate Systems: 

Identifying, exchanging, and integrating information from different 
and perhaps unfamiliar sources are functions that are essential to the 
effective use of networked information for a wide range of goals, 
including the provision of electronic government services. Federal 
agencies exchange data with many external entities, including other 
federal and state agencies, private organizations, and foreign 
governments. For example, federal agencies routinely use data 
exchanges to transfer funds to contractors and grantees; collect data 
necessary to make eligibility determinations for veterans, social 
security, and Medicare benefits; gather data on program activities to 
determine if funds are being expended as intended and the expected 
outcomes achieved; and share weather information that is essential for 
air flight safety. 

If a data exchange does not function properly, the data being received 
by a computer system could cause it to malfunction or produce 
inaccurate results, or the data may not be received at all. However, 
because systems providing information to an organization are 
frequently external or were developed for other purposes, they may 
structure and format the needed information in incompatible and 
unpredictable ways, making data exchange problematic. Effective data 
sharing among computer systems faces many problems, including: 

* incompatible operating systems and hardware platforms, 

* incompatible computer applications written in different programming 
languages, 

* inconsistent or poorly developed data definitions, and, 

* incompatible data transmission protocols. 

Without predefined standards in place, systems developers may need to 
define in detail the precise steps to be taken to carry out the 
exchange of a set of data, and these definitions must be encoded in 
the software and hardware of both transmitting and receiving systems-—
a potentially complex, time-consuming, and expensive process. 

In contrast, if standards are in place for how data are structured and 
tagged, it can be more efficient and less expensive to develop 
interfaces, and as a result data exchange can be facilitated. A 
hypothetical state driver's license system offers a good conceptual 
example of the potential benefits of a data tagging standard for (1) 
interfacing disparate systems and (2) locating and sharing data among 
these systems. In processing an application for a driver's license, a 
state government agency might want to consult a number of local, 
state, or federal databases before issuing or renewing the license, 
including records of residency, traffic violations, criminal 
convictions, tax payments, and others. In today's environment, each of 
these systems could be operated by a different entity and could use 
incompatible systems software and computer applications, which could 
cause data-sharing problems. One solution would be to tag data in a 
standard way so that it could be easily shared among all these systems 
and databases. 

Standardized tagging helps solve the problem by formatting both the 
data and relevant information about the data according to a standard 
that can be readily interpreted by any other system that recognizes 
that format and understands the data definitions and structures that 
are used. In our example, each state agency may have relevant 
information about a drivers' license applicant stored in a different 
format. The applicant's name might be called "Name" in one system but 
divided into "Lastname," "Firstname," and "MiddleInitial" in another 
system. Further, the database system software running at each agency 
might use different commands and programming syntax to access and 
query its databases, requiring that any system wanting to connect and 
access its data conform to that agency's unique structures. However, 
if the data were made available to other organizations using a 
standardized tagged format, these agency-unique discrepancies could be 
overcome. All name information, for example, might be consistently 
tagged as & lt; Name >. Even if it did not use this standard tag 
internally, each state agency would be responsible for matching up its 
internal data structures to the appropriate standard data tags, which 
would have agreed-upon definitions. The standard tags would make it easy 
to connect to each agency and exchange relevant information, because 
each exchange would use the same format to transfer the data and annotate 
(tag) what it means. Of course, polices and procedures would still be 
needed to ensure that the data were exchanged only for authorized purposes, 
and each system would have to conform to the standards in use and agree on 
standard data definitions and structures. 

Figure 1 shows the role that a set of tagging standards such as XML 
could play in facilitating data sharing among disparate agencies. 

Figure 1: A Hypothetical XML-Based State Driver's License System: 

[Refer to PDF for image: illustration] 

Request for driver's license: 

State agency processes the request, using XML, sends and receives data 
to and from: 

Tax records; 
Criminal records; 
Traffic violations; 
Other state and federal information systems. 

Agency decision: issue driver's license or reject request. 

Source: GAO. 

[End of figure] 

Tagging data in a consistent, standard way can also make it much 
easier to locate information that is dispersed among incompatible 
computer databases and difficult to access. In the example of the 
driver's license application, the fact that an applicant had a 
criminal record might remain unknown to the licensing agency if the 
information was stored in an incompatible—-and thus inaccessible—-
database. On the other hand, consistent, standardized tagging would 
help make the information much easier to find, because the licensing 
agency could perform a search based on a standard tag definition, 
knowing that all relevant information should be tagged in the same way 
and thus should be identified by that search. The standardized tagging 
of data has the potential to bring a similar benefit to individuals 
searching for information over the Internet. Instead of simply finding 
instances of text that match a given string of characters, Web-based 
search engines could locate and report on data by examining tags 
reflecting the content of the data. In all likelihood, such searches 
would produce more focused and useful results. 

XML Supports Internet-Based Data Exchange: 

XML is a nonproprietary set of standards for tagging information so 
that it can be transmitted over a network such as the Internet and 
readily interpreted by many different computer systems. It is platform-
independent, meaning that it can operate on any combination of 
computer hardware and XML-enabled software. The core XML standard, 
known as XML 1.0, was adopted in 1998 by the World Wide Web Consortium 
(W3C), which has jurisdiction over the Internet's technical standards. 
It is a subset of the well-established Standard Generalized Markup 
Language, which was approved and published by the International 
Organization for Standardization in the 1980s[Footnote 5] and is used 
primarily in large organizations for tagging technical documents. 

XML code is designed to be clearly intelligible to a human reader and 
involves embedding descriptive tags around data in a computerized text 
file. Figure 2 shows a simple example where "President George 
Washington" has been tagged in XML to indicate what kind of data each 
of the three words represents. The "NAME" tag uses a hierarchical 
structuring capability to distinguish two subcategories of tags, 
"FIRST" and "LAST." All XML documents have the ability to structure 
data in a similar hierarchical manner. The example also includes the 
use of a data attribute—a rank of "1" has been assigned to the office 
of the president. 

Figure 2: XML Code Example: 

[Refer to PDF for image: illustration] 

[End of figure] 

Hypertext Markup Language (HTML), the current standard for displaying 
information on the World Wide Web, also uses tags embedded in text 
files and is also a subset of the Standard Generalized Markup Language. 

However, unlike XML, HTML's tags are predefined and are used solely to 
transmit instructions for displaying information on Web pages. HTML 
tags describe document structures (that is, whether text should be 
treated as a heading, a list, a quotation, and so on) and document 
appearance (such as whether text should be emphasized, larger or 
smaller than surrounding text, or in a particular type font or color). 
A Web browser that receives an HTML file simply displays the stream of 
data that it receives according to the HTML instructions, without 
"understanding" what information it is displaying. Table 1 summarizes 
the differences and similarities between HTML and XML. 

Table 1: Comparison of HTML and XML: 

Differences: 

HTML: Tags are predefined and are intended to provide	formatting and 
display instructions. 

XML: Data tags are not predefined and can be used to label data 
according to any hierarchical structure. 

HTML: Data in HTML documents generally cannot be interpreted and 
processed without human intervention. 

XML: Data in XML documents can be automatically interpreted and 
processed by XML-enabled systems. 

HTML: Strength is in displaying information on a Web browser. 

XML: Strength is in facilitating data exchange. 

HTML: HTML is designed to overlook syntactical errors and focus on 
displaying information. 

XML is designed to check for syntactical errors and ensure conformance 
with data structures (or templates), when specified. 

Similarities: 

Both are nonproprietary W3C standards that can potentially work on a 
variety of computer systems. 

Both are designed to rely on Internet protocols as a means of 
providing connectivity to a broad range of systems. 

Both are based on the Standard Generalized Markup Language and thus 
are structured as text files with tags that can be read and understood 
by humans. 

[End of table] 

When a system using XML is developed, several basic components may be 
needed to provide ways to do such things as (1) define the tags that 
are used in an XML document, (2) validate the correct use of a 
document's tags, and (3) provide formatting instructions for 
displaying the data Table 2 summarizes important basic components that 
are often part of XML implementations currently in use. 

Table 2: Basic XML Components: 

Component: XML document; 
Description: A text document marked up with descriptive tags and 
attributes. An XML document can also begin with declarations that 
refer to other files providing further instructions for interpreting 
and displaying data elements. 

Component: Document type definition (DTD) or XML schema; 
Description: A DTD is a file that describes the structure of XML 
documents and defines how markup tags should be interpreted. A DTD can 
be used to automatically interpret multiple documents in a uniform 
way. XML schemas serve the same function as DTDs but provide greater 
definitional power and are more flexible. For example, XML schemas can 
specify what type of data a tag refers to—such as whether it is an 
integer or a text string. 

Component: Parser; 
Description: Software that reads an XML document and determines the 
structure and properties of the data in the document. 

Component: Style sheet; 
Description: A text file that provides instructions for formatting and 
displaying the information in XML documents. Style sheets can include 
variations depending on the type of device used to access the 
document. For example, the same XML document could be displayed 
differently on a handheld wireless computer or a desktop computer, 
based on different style sheets. 

XML namespace; 
Description: A unique identifier, such as a Web address, referenced at 
the start of an XML document as a source for definitions of the tags 
and other data structures used in the document. An XML document can 
reference more than one namespace.
		
[End of table] 		 

XML's Technical Standards Provide the Tools to Describe and Exchange 
Data over the Internet	Because the core W3C XML 1.0 standard provides 
only limited features, an entire family of related technical standards 
has been developed to define and structure in greater detail the ways 
in which XML is to be used. XML's technical standards define the basic 
rules for using XML components to tag, structure, and display 
information. Technical standards can be divided into two groups: core 
standards and supplemental extensions. Core technical standards 
developed by the W3C provide the fundamental rules for using XML and 
include the following: 

* XML 1.0 specifies how to use markup symbols to define and describe 
the content of data elements and their associated attributes. By 
design, XML 1.0 does not focus on providing specifications for 
document processing, such as specific presentation formats and 
processing instructions. Rather, these issues are addressed by other 
standards. 

* XML Stylesheet Language (XSL) describes how to use electronic files 
called style sheets to provide instructions for formatting XML 
documents for display in a variety of visual media. Different style 
sheets are created and used to display the same XML document on 
different media, such as a desktop computer or a palm-sized device. 
XSL includes two extensions of its own—XSL Transformations (XSLT) and 
XSL Formatting Objects (XSLFO). XSLT makes it possible to convert (or 
transform) the original structure of an XML document to match the 
structure of another XML document. XSL-FO provides the formatting 
vocabulary to carry out such a transformation. 

* The XML Schema standard provides a superset of the capabilities 
found in XML 1.0 for document type definitions (DTDs). It offers 
comprehensive instructions for describing the structure and 
constraining the contents of XML documents. The XML Schema standard 
also specifies a robust system of data types, including a number of 
predefined data types that can be associated with XML data elements 
and attributes to help manage dates, numbers, and other special forms 
of information. 

* The XML Namespace standard provides guidelines for uniquely 
identifying the data definitions that appear in an XML document, thus 
avoiding ambiguity among data elements with the same name that may 
come from different sources. 

In addition to these core standards, a number of supplemental 
standards have been developed or are proposed to codify how additional 
functions should be performed. When developers identify a need for new 
functions to be incorporated into XML technology, new supplemental 
specifications can be developed as extensions to the core XML 
standards. These supplemental specifications have been designed as 
separate standards so that they can be used when needed as modular 
enhancements to individual implementations. Examples of supplemental 
technical standards include the following: 

* The Document Object Model (DOM) is a platform--independent and 
language--neutral application-programming interface. DOM allows 
programmers to develop applications that can dynamically access and 
update the content and structure of XML documents. 

* The XML Linking Language (XLink) standard allows XML documents to 
contain links similar to HTML hyperlinks. While )(Link is similar to 
HTML linking, it adds new features to make links more flexible and 
precise. For example, )(Link allows a link to point to a specific 
reference within an external file rather than simply pointing to the 
file as a whole, as in HTML. 

* XML Path Language (XPath) provides a common syntax and semantics for 
addressing specific parts of an XML document. XPath gets its name 
through its use of a path notation for navigating through the 
hierarchical structure of an XML document. 

XML Was Designed to Accommodate Numerous Extensions: 

An important advantage of XML is that it is flexible enough to 
accommodate an unlimited number of uses. Each new use is accommodated 
by the development and standardization of extensions to the core set 
of XML standards. This is what makes XML "extensible"; its structure 
can be adapted (or extended) to meet many different needs. 

In addition to the supplemental technical standards already discussed, 
XML can accommodate extensions to suit the needs of specific 
communities of users, such as chemists, travel agents, and numerous 
others. As a result, many efforts are under way to define specialized 
tags and other XML data structures and processing protocols to suit a 
variety of specific business purposes. For example: 

* Electronic business XML (ebXML) is being developed as a complete, 
modular suite of specifications to enable the conduct of business over 
the Internet. 

* Mathematicians have created an extension of XML, called the 
Mathematical Markup Language, that allows them to insert equations 
into Web pages that can then be copied into specialized software 
applications and immediately used for calculations. The W3C has 
approved the Mathematical Markup Language as a standard. 

* The HR-XML Consortium, an industry coalition, is developing XML 
vocabulary and data structures to meet the needs of the human capital 
field, including such functions as exchange of staffing data and 
payroll transactions. 

* The Extensible Business Reporting Language (XBRL) was developed by a 
consortium of industry and public sector organizations as a standard 
for reporting and analysis of financial information. 

XML Can Enhance Information Search, Retrieval, and Analysis: 

If widely implemented using consistent data definitions, XML can be a 
very effective tool to facilitate searching for, identifying, and 
integrating information from different and perhaps unfamiliar sources. 
For example, because XML uses data tags (as discussed earlier), it can 
be used for more precise data queries and collections, both locally 
(for a specific organization) and across the Internet. XML's data tags 
can be used to precisely identify individual data elements, allowing 
XML-based systems to collect and integrate specific types of data 
relatively easily from a variety of sources and create reports or 
support other kinds of analysis that otherwise might require a much 
more labor-intensive effort. For example, the federal government 
annually produces many reports with large amounts of tabular data, 
such as cost figures and other numerical statistics. If tagged in XML 
using agreed-upon data definitions, specific data elements could be 
located within these tables, retrieved, and recombined to form a new 
kind of analysis. In fact, the data could be dynamically retrieved 
each time the analysis was examined, if up-to-the minute information 
were desired. Officials from the EPA and other federal agencies are 
currently working on a centralized Web site for federal government 
statistical information—-called FedStats-—with the objective of using 
XML to provide this kind of capability. 

Similarly, XML could be used to enhance general Web search engines. As 
mentioned earlier, the use of data tagging would provide for more 
precise searching than current approaches, which are based on 
relatively crude quantitative measures, such as the frequency of 
occurrence of a given string of text or the proximity of one text 
string to another. Some databases have already been developed to take 
advantage of this feature of XML. The news agency Reuters, for 
example, which has archived over 800,000 news stories, used XML tags 
to classify these into 775 searchable categories. 

Once XML code is written, not only its creators but also external 
parties can potentially reuse it. For example, after Amtrak created an 
XML system to access its application and database system, the 
associated data tags and structures were reused for a voice 
recognition reservation system. According to XML experts, additional 
cost savings may be realized in the future as well, because it will 
likely be easy for new systems and applications to recognize and make 
use of XML data. 

XML's extensibility also facilitates interaction among a variety of 
devices. The same XML document can be interpreted through different 
style sheets to suit any number of different display devices. Figure 3 
illustrates this benefit. 

Figure 3: XML Can Facilitate the Use of Different User Interfaces and 
Display Devices: 

[Refer to PDF for image: illustration] 

XML document: 

Style sheets: 

Handheld device; 
Printer; 
Desktop computer; 
Voice browser. 

Source: GAO. 

[End of figure] 

XML Usage Complements Traditional Electronic Data Interchange 
Applications: 

XML does not represent the first attempt by IT developers-—or the 
federal government-—to standardize the process of data exchange. The 
EDI[Footnote 6] standards were also developed for this purpose, but 
their use has been limited. EDI has been implemented mostly by large 
organizations, which have the resources to buy the custom software 
generally required and to set up private communications networks. 
Another obstacle to implementing EDI is that it requires individuals 
with specialized knowledge to perform tasks such as converting an 
organization's business data into the correct formats of the 
transmission standard, an often complex and time-consuming process. In 
contrast, XML has the potential to be more widely adopted, since it 
was designed to use the Internet's data communications infrastructure, 
which is already in place. 

The EDI set of standards consists of electronic message formats for 
many business-related documents used in electronic transactions. 
Figure 4 is an example of an EDI-formatted "Request for Quotation" 
that adheres to the American National Standards Institute (ANSI) 
Accredited Standards Committee (ASC) X12 EDI standard. As the figure 
shows, data in an EDI-formatted document are cryptic. This is a major 
difference between EDI and XML, which uses simple text files and tags 
that are intended to convey readily understandable meaning (see figure 
2). The cryptic format of EDI standards serves as an impediment to 
their broad adoption, because extensive, specialized knowledge is 
required to interpret EDI messages, troubleshoot problems, and adapt 
existing systems to conform to the standards. 

Figure 4: A "Request for Quotation" Formatted as an EDI Message: 

[Refer to PDF for image: illustration] 

ISA*00*	*00*	*ZZ*GATEC	*ZZ*PUBLIC	*960508*...
GS*RQ*GATEC*PUBLIC*960508*1237*000721330*X*003010 ST*840*000721331
BQT*00*F3360196T7174001*960508*106*960509
REF*IL*FM230061280242
PER*IC**EM*F33601 @EC099.LLNL.GOV
DTM*002*960517 
P01*1*54*BX***FT*8940*SI*5499*FS*8940011728888*MF*SANDOZ ... 
l*MF*SANDOZ NUTRITION*MG*NDE 00212-4580-01 PID*F****SUPPLEMENT, 
TOLEREX, DIETARY,
CTT*1
SE*16*000721331 GE*1*000721330 IEA*1*000721332 

Source: Department of Defense. 

[End of figure] 

EDI has been the primary data format used by large organizations to 
transfer business data among themselves, and it continues in 
widespread use. After an extensive effort to participate in and 
encourage the development of EDI standards, key federal government 
agencies such as the Department of Defense (DOD) and General Services 
Administration (GSA) adopted EDI as the standard format for data 
interchange for a number of their business systems. However, smaller 
federal agencies generally have not made the same commitment to EDI. 
Lacking the necessary skills and resources, many small and midsize 
companies also have not adopted EDI. Accordingly, EDI-enabled 
organizations have been unable to conduct automated electronic 
business with those organizations that have not developed the same 
capability. As a result, EDI has not attained universal use as a data 
exchange standard. 

According to reports from Giga Information Group[Footnote 7] and the 
Logistics Management Institute,[Footnote 8] XML is not a replacement, 
but a complementary technology for EDI. Although both EDI and XML can 
be used to accomplish the same basic task—facilitating the transfer of 
business data from one system to another—each technology has 
advantages and disadvantages. Depending on business needs, the two can 
be used together, particularly if companies have already invested in 
EDI methodologies. The convergence of EDI and XML can provide a 
potentially lower cost alternative for small and midsize companies to 
conduct business with federal agencies that already have traditional 
EDI systems in place. 

One advantage of EDI is that a full suite of standards is already in 
place to support business transactions. For example, figure 5 depicts 
the typical flow of electronic documents between a buyer and seller in 
an acquisition process using ANSI ASC X12 EDI transactions. 

Figure 5: Typical Flow of Business Transactions Based on EDI Standards: 

[Refer to PDF for image: illustration] 

Buyer: Request for quote (RFQ) (840) to Seller; 

Buyer: Technical specifications (841) to Seller; 

Seller: Response to RFQ (843) to Buyer; 

Buyer: Purchase Order (P.O.) (850) to Seller; 

Seller: P.O. Acknowledgment (855) to Buyer; 

Buyer: P.O. Change (860) to Seller; 

Seller: P.O. Change Acknowledgment to Buyer; 

Seller: Invoice (810)/Advance Ship Notice (856) to Buyer; 

Seller: Freight bill (210) to Buyer; 

Buyer: Receiving Advice (861) to Seller; 

Buyer: Payment Order/Remittance Advice (820) to Seller; 

Buyer: Electronic Funds Transfer to Seller. 

Buyers overall flow: 
Purchasing; 
Receiving; 
Financial. 

Seller overall flow: 
Sales order entry; 
Shipping; 
Financial. 

Source: Department of Defense. 

[End of figure] 

XML has the potential to lower costs for data exchange because it can 
take advantage of the Internet's communications infrastructure and 
protocols.[Footnote 9] EDI, on the other hand, was developed before 
the Internet became commonplace and thus has generally involved buying 
customized software and setting up expensive, private communications 
networks. These features have some advantages: the dedicated links 
associated with private communications networks are generally more 
reliable than a simple Internet connection, and the condensed format 
of EDI transactions makes it possible to transmit them much more 
efficiently than XML documents. However, the expense involved in 
attaining this capability is likely prohibitive for many applications. 
Table 3 provides a summary comparison of the major features of EDI and 
XML. 

Table 3: Comparison of EDI and XML: 

Difference: 

EDI: Is based on industrywide EDI business standards, such as EDIFACT 
and ANSI X12, that are well-established, providing standard electronic 
formats for electronic transactions. 
XML: Lacks a complete set of business standards to support XML-based 
electronic transactions that are broadly agreed upon. 

EDI: Uses highly structured predefined formats that have specific, 
narrowly defined purposes. 
XML: Has the flexibility to allow new vocabularies to be defined to 
meet changing business needs. 

EDI: Originally designed to rely on private networks known as "value-
added networks" for data exchange. 
XML: Designed to take advantage of the Internet's capabilities and 
existing protocols for data exchange. 

EDI: Supports data exchange only. 
XML: In addition to data exchange, supports other data handling 
functions, such as content management and sophisticated Web searches. 

Similarities: 

Both standards are freely available and nonproprietary; 

Both facilitate data exchange between disparate computer applications; 

Both allow developers to add proprietary extensions to their specific 
implementations. 

[End of table] 

Federal XML Projects Vary in Size and Scope: 

XML is being broadly implemented, both commercially and within 
government. In the private sector, the Giga Information Group 
published the results of a survey to gauge the adoption of XML among 
its client base in April 2001.[Footnote 10] Based on responses from 80 
businesses ranging from banking and insurance to health care and 
manufacturing, 81 percent said they had begun using XML in their 
organizations. Of the 18 percent of respondents who said they had not, 
76 percent planned to use XML within the next year. The primary 
reported uses of XML were for enterprise application integration and 
business data exchange. Other areas of usage included data 
integration, publishing, content management, portals, and application 
development. 

Federal XML projects undertaken to date have varied significantly in 
size and scope. In some cases, agencies have used XML to enhance data 
exchange within relatively narrow communities of interest with well-
defined data exchange requirements. The Securities and Exchange 
Commission's (SEC) Electronic Data Gathering, Analysis, and Retrieval 
(EDGAR) system and Amtrak's reservation system are two examples. In a 
few other cases, concerted efforts have been made to define XML-
related data standards—-or design a process for doing so-—for larger 
communities of interest. Specifically, the Department of Justice has 
developed a set of definitions for basic data elements shared by 
several law enforcement information networks. Similarly, EPA has been 
working with state environmental agencies to develop XML data 
standards for a national network of environmental information. Several 
efforts are also under way within DOD to develop a common 
infrastructure to support the use of XML across the department. 

Securities and Exchange Commission: 

In the SEC's case, agency officials made the decision to design their 
modernized EDGAR system to use XML for all external data exchanges as 
well as internal processing. However, as it is currently operating, 
EDGAR continues to use other more commonly known document formats 
because many external systems that interact with EDGAR are not yet XML-
compliant. 

According to agency officials, since 1992, the SEC has used EDGAR to 
electronically collect the financial and other business information 
that public companies are required by law to submit on a regular 
basis. As part of a larger modernization effort, the SEC in April 2001 
began requiring that submissions be formatted with headers encoded in 
XML. The agency's EDGARLink client software, distributed to filers at 
no charge, uses a specialized vocabulary called the Extensible Forms 
Description Language to format headers in XML for transmission to the 
SEC. Although SEC officials have not quantified any cost savings 
associated with implementing XML, they believe its use has saved the 
agency software development expenses, because filers now use a 
commercial off-the-shelf product to format their submissions, instead 
of custom software, as had been previously required. According to SEC 
officials, third-party software developers should also be able to 
reduce costs by using commercial XML products to format submissions. 

SEC officials stated that their use of XML to date has been limited to 
functions that did not require coordination with other government or 
private sector organizations. Because the SEC provides filers with 
copies of the XML-formatting software at no charge, it has been able 
to fully control how XML is implemented in the software and what 
specific vocabulary is used. The Extensible Forms Description Language 
that was used has been submitted to the W3C as a proposed standard but 
has not yet been approved. 

SEC officials would like to broaden the use of XML to cover all the 
data in EDGAR filings rather than just header information. Doing so 
would take much fuller advantage of XML's strengths and allow 
investors to better access financial data and automatically perform 
many kinds of analyses. However, to do so would require agreement on a 
complete vocabulary of data tags and schemas for describing financial 
statement information, which could require coordinating with other 
groups such as the XBRL.org consortium, which is also developing 
business vocabularies related to financial reporting. Further, in 
addition to agreeing upon a standardized vocabulary, developers would 
need to make software available to format financial information 
according to the standards so that it would not be burdensome for 
filers to conform to the standard vocabulary. Since none of this has 
yet happened, SEC officials believe it is not in the best interest of 
filers to levy an XML requirement at this time. 

Amtrak: 

Amtrak, a federally chartered corporation, has successfully used XML 
to enhance its reservation system, according to Amtrak officials. 
However, in doing so, officials say they have consciously taken the 
risk that their self-defined data structures may not match industry 
standards that emerge in the future. According to Amtrak officials, 
the use of XML has streamlined software development, including 
reducing costs, and produced an easier set of specifications for 
travel agencies to address when developing or modifying their own 
systems. In moving to XML, Amtrak officials found that they were the 
first in the railroad industry to attempt to convert their data to XML 
format, and thus they were free to define data tags as they wished. 
They decided to base their definitions on specifications developed by 
the OpenTravel Alliance[Footnote 11] but found that those 
specifications were not sufficiently articulated to meet all of 
Amtrak's needs. As a result, Amtrak defined new tags for rail 
reservations purposes when none were available. Amtrak officials told 
us that they expect the OpenTravel Alliance to continue to develop its 
specifications, and tags may be standardized that are incompatible 
with Amtrak's. In that case, Amtrak will likely have to modify its 
system to meet the new industry standards. 

Department of Justice: 

The Department of Justice reported in October 2001 that it had taken 
steps to move beyond single-system implementations of XML and 
facilitate broader information sharing and integration of justice 
information systems nationwide.[Footnote 12] The need for effective 
data sharing among law enforcement agencies has been highlighted by 
the department's recent heightened efforts to combat the threat of 
terrorism. According to its October 2001 report, the department's 
experience to date shows that defining and implementing XML data 
standards across more than one organization is a complex process that 
requires a concerted effort. 

Until recently, elements within the department had been working on 
three separate XML-related data standardization efforts: (1) a 
standard format for criminal histories, (2) a standard for law 
enforcement agencies to share criminal intelligence information, and 
(3) a data standard for electronic court filings. In June 2001, the 
department's working group on infrastructure and standards undertook 
an effort to reconcile the separate data tags and definitions that the 
three initiatives had developed. According to the department's lessons 
learned report, the reconciliation effort was an intense process that 
required the close cooperation of all participants. For example, in 
the beginning, the working group found that the three existing 
standards diverged in important ways for many basic data structures, 
such as how to represent individuals' names. Initially, 
representatives from the three different communities were reluctant to 
make changes in the existing definitions to accommodate a broader 
standard. However, ultimately the group was able to develop a draft 
"XML Justice Data Dictionary" containing 128 data elements. 

Justice faces additional challenges in ensuring that its newly 
standardized data elements are broadly adopted. The department plans 
to establish an XML registry for these data elements but has not done 
so yet. Nor has a decision been made about working to integrate these 
elements into a developing commercial standard vocabulary, such as 
Legal XML. Both actions may be needed to promote the use of the 
department's data elements in law enforcement systems. 

Environmental Protection Agency: 

Like Justice, EPA has attempted to work within its community of 
interest—state environmental protection agencies—to build an 
infrastructure for common access, both locally and nationally, to 
environmental information, according to EPA officials. EPA is required 
by law to collect a large volume of information from the states in 
order to carry out its mandated functions, including oversight of 
state-level programs and administration of national programs. Since 
1998, EPA and the states have been working on developing a National 
Environmental Information Exchange Network, using the Internet and 
standardized data templates, written in XML, to facilitate the 
exchange of data among participating partners. According to EPA 
officials, the network will be largely in place in fiscal year 2003, 
when templates are to be in place for priority data flows and a large 
number of the states are expected to be participating. 

In addition, EPA officials report that they have taken steps to 
promote uniform internal implementation of XML. The agency established 
an XML technical advisory group as a forum for sharing advice and 
guidance about implementing XML. The group has focused on education 
and outreach. In addition, EPA officials said they are developing an 
XML registry to support the agency's Central Data Exchange facility, 
which they plan to have operational in April 2002. 

Department of Defense: 

Officials in DOD foresee the potential use of XML in many of the 
department's systems and reported that they are taking action to 
promote interoperability of these systems and reuse of XML data 
components, both "vertically" within individual projects and 
"horizontally" across departmental organizations. Three major efforts—
at the Defense Information Systems Agency (DISA), the Defense 
Logistics Agency, and the Department of the Navy—are focused on 
standardizing the implementation of XML. 

DISA is promoting what officials call a "market-based" approach to 
standardizing the use of XML. According to this strategy, DISA will 
provide a central data clearinghouse—including an XML registry of 
standard data elements, definitions, and structures—where systems 
developers can come to share data elements and structures that they 
have developed or to locate existing ones that can meet their needs. 
The registry is designed to accommodate a number of different levels 
of compliance for different applications. DISA officials said they 
have created distinct domains within their clearinghouse where 
specific DOD communities of interest—such as personnel, finance and 
accounting, and military intelligence—can define their unique data 
structures. The agency has already established this data clearinghouse 
and has defined a management process for collecting, storing and 
disseminating XML components such as schemas, elements, attributes, 
DTDs, and style sheets. According to DISA officials, DOD is 
considering adopting this clearinghouse, together with the processes 
for managing it, for use in all departmental systems. 

The Defense Logistics Agency's Defense Logistics Information Service, 
which handles large quantities of information about military 
logistics, has been developing a repository of data structures related 
to logistics. According to agency officials, the service established 
an internal XML working group that initially identified the XML-based 
data exchange requirements of its customers and developed standard 
data definitions and structures based on those requirements. Officials 
said that the service is currently at work identifying its internal 
needs for an XML registry, evaluating commercial software tools, and 
assessing how it should interact with external systems, such as DISA's 
registry. 

The Department of the Navy established an XML working group in August 
2001 to provide leadership and guidance in maximizing the value of XML 
across the Navy. According to Navy officials, the group's initial 
activities have been to develop interim Navy XML policy and prepare an 
initial Navy XML developer's guide. The developer's guide is currently 
in draft form and is planned for official release in the first quarter 
of 2002. The group's goals for the developer's guide are to provide 
enough specific guidance to developers to ensure that they "move in 
the right direction," while being general enough to minimize the 
chance of conflict with future guidance. 

Objectives, Scope, and Methodology: 

Our objectives were to assess (1) the overall development status of 
XML standards to determine whether they are ready for governmentwide 
use and (2) challenges faced by the federal government in optimizing 
its adoption of XML technology to promote broad information sharing 
and systems interoperability. 

To address our objectives, we reviewed documentation and held 
discussions with representatives from the Chief Information Officers 
(CIO) Council's XML Working Group and key experts from the private 
sector, including KPMG, the Logistics Management Institute, and 
Microsoft Corporation. The XML Working Group is responsible for 
planning, accelerating, facilitating, and bringing about effective and 
appropriate implementation of XML technology in the information 
systems of the federal government. The key experts we contacted from 
the private sector are actively involved in one or more XML 
initiatives that may benefit the federal government. 

To evaluate the maturity of XML standards for potential governmentwide 
adoption, we identified and assessed the progress of major 
nongovernmental standards activities, including those of the W3C, the 
Organization for the Advancement of Structured Information Standards 
(OASIS), the United Nations Center for the Facilitation of Procedures 
and Practices for Administration, Commerce, and Transport (UN/CEFACT), 
and RosettaNet. 

We also held discussions with and reviewed documents from the XML 
Working Group, GSA, EPA, the National Archives and Records 
Administration (NARA), the National Institute of Standards and 
Technology (NIST), DOD, Justice, SEC, and Amtrak. These discussions 
and documents formed the basis for our assessment of the (1) progress 
of the federal government in planning and coordinating federal XML 
initiatives and (2) remaining challenges to be overcome in 
implementing XML technology throughout the government. In addition, we 
researched and reviewed documentation on XML prepared by the 
government of the United Kingdom, the National Electronic Commerce 
Coordinating Council, and the National Association of State Chief 
Information Officers. 

We performed our review in accordance with generally accepted 
government auditing standards, working from April 2001 through January 
2002, at various locations, including GSA Headquarters in Washington, 
D.C.; NARA Archives II in College Park, Maryland; and NIST 
Headquarters in Gaithersburg, Maryland. 

[End of Chapter 1] 

Chapter 2: A Comprehensive Set of Standards for Implementing XML Is 
Only Partially in Place: 

Key technical standards for XML have been largely worked out under the 
auspices of the World Wide Web Consortium (W3C). These technical 
standards are focused on providing the generic structure and tools to 
tag data, transmit it over the Internet, and allow it to be processed 
by the computer systems that receive it. 

Business standards, though equally important, are generally less well-
developed, and reaching agreement on them is proving to be difficult 
when multiple communities of interest are involved. Business standards 
are needed to provide a more complete framework for conducting 
business over the Internet, including advertising products and 
services so that potential buyers and sellers can find each other, 
proposing and agreeing upon electronic transactions, and executing the 
agreed-upon transactions. 

Business standards are also needed to define vocabularies for the 
specific data elements that are to be exchanged when transactions are 
conducted. These vocabularies, once fully developed, may also be 
useful to the government in certain cases. However, many of these 
potentially useful standard vocabularies are still in the initial 
stages of development and do not provide all the data structures 
needed to support current government needs. 

XML Technical Standards Have Largely Been Defined: 

The W3C organization has completed development of a suite of core 
technical standards for XML, as well as a number of functional 
extensions. As table 4 shows, a number of core technical standards 
have been approved as official "recommendations" by the W3C.[Footnote 
13] In addition, various functional extensions are currently in 
development, such as XPointer, which defines how individual parts of a 
document are addressed; XQuery, which is a language for retrieving and 
interpreting information from diverse sources; and SOAP (Simple Object 
Access Protocol), which allows software programs to access and 
communicate with each other over a network such as the Internet. 

Table 4: XML Technical Standards as of February 2002: 

Technical standard: Extensible Markup Language (XML) 1.0; 
Description: Core standard for XML language; 
Comments: 1st edition approved for implementation February 1998; 2nd 
edition approved October 2000. 

Technical standard: Extensible Stylesheet Language (XSL); 
Description: Core standard for formatting XML documents; 
Comments: V 1.0 approved for implementation, October 2001. 
	
Technical standard: XML Schema; 
Description: Core standard for specifying the structure, content, and 
semantics of XML documents; 
Comments: Approved for implementation, May 2001.	 

Technical standard: XML Namespaces; 
Description: Core standard for defining unique identifiers to qualify 
elements and attributes that may use the same name; 
Comments: Approved for implementation, January 1999. 

Technical standard: Document Object Model (DOM); 
Description: Generic method to dynamically access and update 
structure, content, and style of XML documents. 
Comments: Level 1 approved October 1998; Level 2, November 2000. Work 
under way on Level 3. 

Technical standard: XML Path Language (XPath); 
Description: Syntax to address specific parts of an XML document.
Comments: V 1.0 approved, November 1999. 

Technical standard: XML Linking Language (XLink); 
Description: Language defining how one document links with another 
document; 
Comments: V 1.0 approved, June 2001. 

Technical standard: Associating Style Sheets with XML Documents; 
Description: Specification providing a method for associating a style 
sheet with an XML document; 
V 1.0 approved, June 1999. 

Technical standard: Cannonical XML; 
Description: Specification describing a method to determine whether 
two XML documents are identical or whether an application has changed 
a document; 
Comments: V 1.0 approved, March 2001. 

Technical standard: XML Base; 
Description: Syntax to define base locations that contain parts of XML 
documents; 
Comments: V 1.0 approved, June 2001. 

Technical standard: XML Information Set; 
Description: Set of definitions for use by other specifications that 
need to refer to information in an XML document;
Approved, October 2001. 

Technical standard: XML-Signature Syntax and Processing; 
Description: Syntax and processing rules for creating and representing 
digital signatures in XML documents; 
Comments: Approved, February 2002. 

[End of table] 

Based on progress to date, W3C technical standards for XML are 
relatively mature, even though work is still in progress on 
supplemental standards. Most of the core technical standards were 
approved within 2 years of being initially proposed, and the fact that 
commercial products are increasingly being made compatible with XML 
appears to indicate that the private sector is in general agreement 
with XML's basic technical infrastructure. For example, vendors 
providing XML-compatible products include such companies as Ariba, 
Commerce One, IBM, Mercator, Microsoft, Oracle, Sun, and WebMethods. 

Additional Standards Have Been Proposed for Using XML to Conduct 
Electronic Business: 

According to industry experts, a suite of business standards beyond 
XML's technical standards is needed in order to enable organizations 
that do not have a previously established methodology for data 
exchange to conduct business and to tap information resources that are 
meant to be shared. Technical standards provide only the generic 
structure and tools to tag data and documents, transmit them over the 
Internet, and process them on the other end. Business standards, in 
contrast, are needed for two reasons. First, a group of standards is 
needed to address the overall process of (1) identifying potential 
business partners for transactions, (2) exchanging precise technical 
information about the nature of proposed transactions so that the 
partners can agree to them, and (3) executing agreed-upon transactions 
in a formal, legally binding manner. In addition to these business 
process standards, a second group of standards is needed to codify the 
precise types of data elements that are to be exchanged when a 
business transaction is conducted. This need is being answered by the 
development of data vocabularies (or languages) designed to meet the 
needs of specific businesses and professions. 

Business process standards aim to capture electronically all the 
critical aspects of arranging and conducting a business transaction. 
For two organizations that have not made detailed arrangements in 
advance, conducting business transactions over the Internet requires a 
series of information exchanges that help define proposed transactions 
in precise terms and then reliably confirm that they have taken place. 
Individual companies first need to identify each other and share 
information about the products and services they offer. They must then 
agree upon which business processes and documents are necessary to 
carry out a proposed transaction, including determining how the 
exchange of information will take place and its contractual terms and 
conditions. Once all this is accomplished, they need to reliably 
exchange business information, products, and services according to 
these agreements. 

Many of these processes can be captured generically for the activities 
of most businesses, although there will also be activities that are 
unique to certain kinds of businesses or certain specialized 
information exchanges. Examples of specifications that address generic 
business processes include the following: 

* Electronic business XML (ebXML) provides a method for companies to 
exchange business messages and data, conduct transactions, and define 
and register business processes. 

* RosettaNet provides vocabularies and business process models (e.g., 
inventory management and product review) for the electronics industry. 

* Universal Description, Discovery, and Integration (UDDI) provides 
directories for Web services description and discovery. Using UDDI, 
companies can discover each other and define how they will interact 
and share information over the Internet. 

In addition to business process standards, standard data vocabularies 
(or languages) will be needed for particular industries, professions, 
and other specific domains. Table 5 shows a representative sample of 
industry-specific efforts. Hundreds of such projects have been 
registered with the xml.org Web portal, which serves as a repository 
for industry XML information. 

Table 5: Representative Industry-Specific XML Vocabularies: 

Vocabulary name: Bioinformatic Sequence Markup Language (BSML); 
Description: Supports the encoding and display of DNA, RNA. and 
protein sequence information. 

Vocabulary name: Chemical Markup Language (CML); 
Description: Addresses needs of the chemical industry, such as data 
tags that can be used to accurately represent chemical formulas. 

Vocabulary name: Extensible Business Reporting Language (XBRL); 
Description: Supports financial information, reporting, and analysis. 

Vocabulary name: Geography Markup Language (GML): 
Description: Supports the transport and storage of geographic 
information, including both the geometry and properties of geographic 
features. 

Vocabulary name: HR-XML: 
Description: Supports human capital management functions such as 
exchange of staffing data and payroll transactions. 

Vocabulary name: Legal XML: 
Description: (In development) Will support the legal and legislative 
profession, especially for electronic court filings. 

Vocabulary name: Mathematical Markup Language: 
Description: (W3C standard) Facilitates the use and re-use of 
mathematical and scientific content on the Web. 

Vocabulary name: OpenTravel Alliance; 
Description: (In development) Will provide a commonly accepted 
communications process for the travel and transportation industry. 

Vocabulary name: Spacecraft Markup Language (SML): 	
Description: Provides standard definitions of XML tags and concepts of 
structure to allow the definition of spacecraft and other support data 
objects. 

Vocabulary name: Wireless Markup Language (WML): 
Description: Facilitates the specification of content and user 
interface for electronic devices such as cellular phones and pagers. 

[End of table] 

Business Process Standards Are Less Well-Developed than Technical 
Standards: 

Ideally, a well-defined set of XML business process standards covering 
all key requirements of business data exchanges should be established 
and universally agreed upon. In conjunction with these basic business 
standards, individual industries would adopt standard vocabularies to 
express their unique data types. If agreement on this overall set of 
standards were achieved, systems developers would have the tools they 
need to build systems that capitalize on XML's potential to facilitate 
interoperability. Without such a universally agreed-upon set of 
standards, however, XML's use could be limited to carefully 
prearranged data exchanges with well-established business partners. 

However, business standards are generally less well-developed and 
agreed upon than XML's core technical standards. Unlike XML technical 
standards, all of which are established and maintained by the W3C, 
business standards are developed by a variety of public and private 
sector organizations, including industry consortia, and are not always 
universally supported. For example, a number of different approaches 
to addressing the process of conducting business transactions have 
been proposed. Currently, at least three of them are vying for support 
and offer functionality that is in part overlapping and incompatible. 
These approaches include the following: 

EbXML: 

UN/CEFACT and OASIS have approved a modular suite of ebXML 
specifications that enables the conduct of business over the Internet. 
[Footnote 14] EbXML's goal is to allow any enterprise—of any size or 
in any industry—to conduct business electronically with any other 
entity anywhere in the world. Launched in November 1999, the ebXML 
project finished its initial development phase in May 2001. At that 
time, it established a set of design rules for data dictionaries as 
well as a number of significant reference documents, including a 
technical architecture, business process specification schema, 
registry information model, registry services specification, 
requirements specification, message service standard, and 
collaboration-protocol profile and agreement. Figure 6 shows a 
representative ebXML transaction involving two organizations that 
locate each other through an ebXML registry and then negotiate and 
carry out the transaction based on ebXML specifications. 

Figure 6: Representative ebXML Transaction: 

[Refer to PDF for image: illustration] 
			
1) Organization A advertises services in ebXML registry. 
			
2) Organization B needs services. 

3) Organization B checks registry and repository. 

4) Organizations A and B agree on proposed transaction using ebXML 
messaging services. 

5) Organizations A and B execute transaction. 

Source: GAO. 

[End of figure] 

In public presentations, Office of Management and Budget (OMB) 
officials have expressed an interest in moving the federal government 
to greater use of ebXML. In October 2001, OMB defined standards for 
success in the area of expanding e-government, and ebXML was cited. 
Specifically, OMB called for federal agencies to "minimize burden on 
business by re-using data previously collected or using ebXML or other 
open standards to receive transmissions.[Footnote 15] 

Although many of ebXML's specifications have been approved, 
specifications for "core components"—-basic data elements and 
structures that are to serve as common building blocks for use across 
industries and business processes-—are still incomplete. Because 
different industries often use different terms to refer to the same 
thing, exchanging information among them can be difficult. Using 
agreed-upon core components as basic elements for building electronic 
business messages could reduce the burden involved in getting these 
divergent systems to interoperate. Software designed to interpret 
business messages composed of standardized core components would then 
be able to operate more broadly across industries, thus increasing 
economies of scale and potentially lowering the cost for small 
businesses to conduct business electronically. 

For example, one component would be an XML data tag structure for 
"bank account," which might consist of an account holder's name and an 
account number. Such a component would find many uses across a wide 
range of business activities and industries. Currently, ebXML has 
published technical reports on the core component methodology and 
framework, but complete specifications have not yet been defined. 

Web Services: 

Several IT companies are supporting the use of a set of standards for 
implementing "Web services." The concept of Web services is to allow 
businesses with on-line offerings to connect to other businesses to 
enhance their offerings with functions provided by those other 
businesses. For example, a company selling merchandise through a Web 
site could connect to a shipping company to automatically make 
shipping arrangements and calculate costs for customers. To form these 
connections, a set of four basic standards has been proposed: XML for 
representing data, UDDI for locating potential business partners on 
the Web and identifying services of interest, SOAP for allowing 
software programs to access and communicate with each other over a 
network such as the Internet, and Web Services Description Language 
(WSDL) for describing what specific functions are available and how 
they can be accessed. 

RosettaNet: 

Funded by a consortium of more than 400 companies, including 
corporations such as IBM, Cisco, and Dell, RosettaNet began as an 
effort to create XML standards for the IT supply chain but has 
expanded to include electronic components and semiconductor 
manufacturing. RosettaNet has developed three dictionaries: a business 
dictionary, e-commerce dictionary, and IT technical dictionary. Its 
business dictionary designates the properties used in basic business 
activities, and its technical dictionaries provide the properties for 
defining products. In addition, RosettaNet has developed electronic 
business guidelines in the form of partner interface process 
specifications, which include business models, impact and benefit 
analyses for implementing the business models, technical software 
designs, and implementation guides. RosettaNet has developed partner 
interface process specifications for administration, product and 
service review, product information, order management, inventory 
management, marketing information management, service and support, and 
manufacturing. Even though RosettaNet standards were designed for the 
electronics industry, they offer an approach for defining and modeling 
business processes that others may follow. 

Based on discussions with industry experts and Web documentation, 
these standards are in different stages of development and acceptance. 
RosettaNet appears to be the most fully developed business standard, 
but it is not endorsed by any internationally recognized standards
organization. EbXML has the advantage of the formal backing of 
UN/CEFACT and OASIS, but its suite of specifications is not yet 
complete. For example, the majority of ebXML's initial efforts focused 
on establishing the underlying rules for data dictionaries rather than 
developing the dictionaries themselves. Development began only in 
October 2001 for a common library of business documents for ebXML that 
will enable trading partners to unambiguously identify and exchange 
business information.[Footnote 16] Without these tools, data that are 
exchanged between organizations may not be interpreted and validated 
consistently. 

Because uncertainty remains about which business standards will 
ultimately prevail, applications developed based on any of the current 
proposals may be at risk of being incompatible with future standards. 
In addition, without universally accepted standards, commercial IT 
vendors may use nonstandard XML extensions that could limit 
interoperability. 

Potentially Useful XML Vocabularies Are Not Ready for Governmentwide 
Adoption: 

Within the business standards arena, XML is being used to create a 
variety of "standard" markup languages for particular industries and 
professions, and many of these languages, once fully developed, may be 
useful to the government as well. For example, in the future, federal 
agencies may be able to use HR-XML to exchange data related to human 
resources functions such as staffing exchange, payroll transactions, 
compensation, and background checking. Likewise, agencies may be able 
to use XBRL to format and develop financial statements in the future. 
And Legal XML could be used to create legal documents such as 
legislative and court documents. It is the policy of the federal 
government to use commercial standards whenever practical. However, 
many potentially useful standard vocabularies are still in the initial 
stages of development and do not provide all the data structures 
needed to support current needs. For example, although high-level 
specifications have been developed in HRXML for several important 
human capital functions, very few specific data elements have been 
specified. Similarly, for XBRL, work has been completed on only one of 
six planned specifications. For Legal XML, no specifications have yet 
been completed. 

HR-XML is being developed by the HR-XML consortium, a nonprofit group, 
to allow employers to reduce the ongoing costs of negotiating human 
capital-related data exchanges on an ad-hoc basis. The consortium has 
focused its efforts on developing a suite of high-level specifications 
for a range of human capital functions, including recruiting and 
staffing, benefits enrollment, payroll, time and expense reporting, 
competencies, and background checking. To date, the specifications for 
all but payroll and background checking have been written. However, 
the consortium has not fully defined a vocabulary of data tags, DTDs, 
and schemas for these functions. 

XBRL is being developed by XBRL.org, an industrywide consortium, and 
is intended to be a standards-based electronic language for financial 
information, reporting, and analysis. In particular, the consortium 
plans to adapt XBRL to a variety of specific applications, including 
financial statements, general ledger, regulatory filings, business 
event reporting, audit schedules, and tax filings. In addition, the 
consortium plans to develop taxonomies (common vocabulary) for 
financial reporting across jurisdictions (e.g., United States, Canada, 
United Kingdom, and Germany) and taxonomies for specific industries 
(e.g., mutual funds, media and entertainment, and agriculture). As of 
this writing, the consortium has completed an XBRL specification for 
financial statements and a taxonomy for financial reporting of 
commercial and industrial companies that reflect the generally 
accepted accounting principles used in the United States. However, 
work on the other specifications and taxonomies has not been 
completed, and existing taxonomies for different communities of 
interest are not completely compatible. 

Legal XML is being developed by a nonprofit organization of the same 
name, made up of volunteers from private industry, nonprofit 
organizations, government, and academia. The organization seeks to 
coordinate activities in both the "vertical" and "horizontal" domains 
of the legal profession. Vertical domains include court filings, 
transcripts, judicial decisions, and public law. Horizontal domains 
include general vocabulary and logical document structure. As of this 
writing, no standards have been completed. 

The fact that many of these vocabularies are still in the early stages 
of development creates challenges for reaching agreement on their use 
for governmentwide or cross-agency functions. Accordingly, the 
governmentwide benefits that may be derived from using these standards 
will not be available in the near term. An apt example is the Human 
Resources Data Network, being developed by an interagency workgroup to 
capture essential workforce information to meet the needs of the 
Office of Personnel Management and other agencies. The planned network 
is intended to (1) replace the paper-based official personnel folders 
that are currently used to document pay, benefits, and work history of 
civilian employees, and (2) serve as a gateway to streamline the 
process by which agencies provide workforce information to the Office 
of Personnel Management. According to project officials, the workgroup 
would like to use commercial standards such as HR-XML to implement the 
planned network, and officials contacted the HR-XML consortium to 
assess the applicability of the standard. However, the HR-XML standard 
is still in early stages of development, with only two approved data 
definitions (for name and address) currently available. In contrast, 
the workgroup has completed a data modeling exercise that identified 
the need to define 984 critical data elements. Unable to wait for 
commercial standards to be developed, the workgroup defined its own 
data structure and vocabulary. Project officials noted that even if a 
fully developed HR-XML vocabulary were available, some of the data 
elements required by the Human Resources Data Network likely would not 
be addressed because they reflect unique government needs. 

[End of Chapter 2] 

Chapter 3: The Federal Government Faces Challenges in Realizing XML's 
Full Potential: 

Although XML offers the potential to greatly facilitate the 
identification, integration, and processing of complex information, a 
number of challenges face the federal government as it attempts to 
take best advantage of the technology's potential. XML system 
developers—both within the federal government and externally—must 
avoid several critical pitfalls when implementing XML, including the 
risk that data will not be adequately defined and that incompatible 
data definitions, vocabularies, and structures will proliferate; the 
potential for proprietary extensions to be built that would defeat 
XML's goal of broad interoperability; and the need to maintain 
adequate security. 

In addition to these pitfalls, which all systems developers must 
address, the federal government faces additional challenges as it 
attempts to gain the most from XML's potential. Specifically, (1) no 
identifiable governmentwide strategy for XML adoption exists to guide 
agency implementation efforts and ensure that agency enterprise 
architectures address adoption of XML. Without agreement on such a 
strategy, agencies risk building and buying systems that will not work 
with each other in the future. (2) The needs of federal agencies have 
not been uniformly identified and consolidated so that they can be 
represented effectively before key standards-setting bodies. If 
federal requirements are not better understood and consolidated, the 
government may be unable to effectively provide input to commercial 
standards while they are still under development. (3) Although work 
has begun on a pilot, the government has not yet fully implemented a 
registry of government-unique XML data structures (such as data 
element tags) that system developers can consult when building or 
modifying XML-based systems. (4) Much also needs to be done to ensure 
that agencies address XML implementation through enterprise 
architectures so that they can maximize its benefits and forestall 
costly future reworking of their systems. 

Implementing XML Presents Pitfalls: 

Although XML offers the potential to greatly facilitate the 
identification, integration, and processing of complex information—
both within the federal government and externally—system developers 
face a number of pitfalls in implementing the technology, including 
the risk that markup languages, tags, DTDs, and schemas will 
proliferate; the potential for proprietary extensions to be built that 
would defeat XML's goal of broad interoperability; and the need to 
maintain adequate security. Regarding the risk that redundant markup 
languages, tags, DTDs, and schemas will proliferate, past experience 
with data interchange has shown that even if a specification such as 
the XML standard is as complete as possible, individual 
implementations can vary tremendously. As a result, it is extremely 
difficult to get consensus on the definitions of data elements. For 
example, tags such as < PO_Number >, < PurchaseOrderNumber>, 
<PO_No>, and < purchase_order_number > could all be used by 
different applications to indicate a purchase order number. On the other 
hand, the different tag names could mean that different definitions of 
"Purchase Order Number" have been used. An XML processor cannot 
independently determine whether these tags all refer to the same thing. 
As a result, the processor must be given explicit instructions 
regarding what tags are equivalent or how to translate one set of tags 
to the format used by another system. 

If diverging data structures and vocabularies proliferate among 
different organizations and user communities, XML's overarching 
promise of broad data interoperability could become more difficult to 
achieve. The use of incompatible data structures would require 
developers to devote resources to an expensive and error-prone process 
of defining and implementing translation schemes to exchange 
information among the incompatible systems. 

The processing extensibility of XML can also have a downside, because 
it allows developers to add proprietary extensions to their specific 
implementations, which could defeat XML's goal of broad 
interoperability. It is easy to add elements to an XML document that 
place unique processing requirements and restrictions on the document, 
thus preventing other systems from being able to interpret it. An 
operating system vendor, for example, could add software "hooks" to 
XML documents that could be correctly processed only by machines 
running that vendor's particular operating system. The fact that the 
core XML standard is nonproprietary thus does not ensure that all 
applications built with it will also successfully interoperate. 

Another important challenge in implementing XML is maintaining 
adequate security. XML's ability to facilitate the direct transfer of 
data between systems that automatically interpret and process that 
data has the potential to increase security risks. When XML is used, 
the direct transfer of data may bypass important security checks, such 
as those built into intermediate data processing software (virus 
checkers, for example). For instance, when a site's virus checker 
examines incoming messages for malicious code, it will not be able to 
check tagged data embedded in XML documents, unless these data are in 
American Standard Code for Information Interchange (ASCII) format. The 
application that then tries to interpret the unchecked XML tags and 
act on the information could be tricked into processing malicious 
code, such as a virus. Because XML is still a relatively new 
technology, it is unclear how significant this potential vulnerability 
will be. We were unable to find documented examples of successful 
intrusions based on this potential vulnerability. 

To mitigate this risk, system developers need to ensure that security 
is addressed when XML-based systems are implemented. For example, 
measures can be taken to check the integrity of the data received by a 
computer system, and software can be used to screen the incoming data 
for malicious code. Likewise, a local store of commonly used DTDs and 
schemas can be maintained as a check against the integrity of the 
corresponding DTDs and schemas that come with XML documents from 
outside sources. 

These are a few of the more significant challenges facing XML system 
implementers. Table 6 summarizes these and other key strengths and 
pitfalls of XML. 

Table 6: Strengths and Pitfalls of XML: 

Strength: XML's flexible, human-readable data tags and structures can 
be easily adapted to many different needs. 
Pitfall: Defining unique data tags and structures can potentially lead 
to compatibility problems with other systems and defeat the goal of 
broad-based data exchange. 

Strength: XML standards are freely available and nonproprietary. 
Pitfall: It is easy for vendors and others to build nonstandard 
extensions to their products and systems, which also could inhibit 
broad-based data exchange. For example, incompatible business 
vocabularies have already been developed. 

Strength: Information in XML documents can potentially be readily 
accessed and shared among disparate systems. 
Pitfall: Increasing access to information that is tagged in human-
readable form increases security concerns. 

Strength: It is easy to search tagged XML data for specific 
information. 
Pitfall: Data that are not highly structured—such as narrative text—
may be difficult to convert to XML. Further, converting nontagged 
information to XML format may require a significant effort without 
prior agreements and established data dictionaries. 

Strength: XML uses the nearly ubiquitous existing infrastructure of 
the Internet. 
Pitfall: Using the Internet involves greater security and reliability 
risks than using private communications links. 

[End of table] 

The Intellor Group, Inc., conducted a survey on XML benefits and 
challenges in 2001 and collected 232 responses from many different 
industries and government agencies.[Footnote 17] The respondents 
identified the major benefits of XML as (1) providing a common format 
that facilitates participation in business-to-business data exchanges, 
(2) establishing common data access techniques, (3) enabling 
integration of enterprise applications, and (4) achieving cost savings 
for data conversion. They identified XML's biggest challenges as (1) 
the immaturity of related standards, (2) the lack of IT staff 
qualified to develop and maintain XML-based systems, (3) choosing 
among competing standards, and (4) security for XML documents and XML-
based transactions. 

Governmentwide Actions to Promote XML Adoption Have Focused on 
Education and Outreach: 

To date, activities within the federal government to promote broad 
governmentwide adoption of XML technology have been limited. Neither 
OMB, which is responsible for developing and overseeing governmentwide 
policies and guidelines for agency IT management, nor NIST, which is 
responsible for developing federal information processing standards 
and guidelines, have defined an explicit governmentwide strategy for 
XML adoption to guide agency implementation efforts and ensure that 
agency enterprise architectures address incorporation of XML. Most 
governmentwide coordination activities have been performed by the XML 
Working Group, chartered by the federal CIO Council to facilitate 
effective and appropriate implementation of XML technology in the 
information systems of the federal government. The working group's 
activities have focused primarily on education and outreach. In 
addition, OMB officials told us that, as part of the annual budget 
preparation process, they have taken steps to encourage agencies to 
use XML consistently and share their development plans with other 
agencies. 

Given that the greatest benefits of XML adoption to the government may 
derive from its promise of facilitating broad interoperability among 
systems in different organizations, it is important that an explicit 
strategy be developed for coordinating XML implementation across the 
federal government's many departments and agencies. However, most XML 
development within the federal government to date has been undertaken 
independently by separate federal organizations, with little or no 
coordination with other agencies. OMB has not issued explicit guidance 
regarding the use of XML, other than to cite ebXML in its October 2001 
standards for success in expanding e-government, as previously 
discussed. Rather than formulating a specific strategy, OMB has relied 
on informal discussions with agency officials, as part of the budget 
preparation process, to encourage them to use XML consistently and 
share their development plans with other agencies. According to OMB 
officials, these actions, along with the XML Working Group's 
coordination activities, serve as the federal government's XML 
strategy. Further, NIST officials told us they are not planning to 
develop any federal information processing standards or other XML 
implementation guidance, which they do not believe are necessary at 
this time. However, we believe that, without a well-defined strategy, 
the government runs the risk that incompatible data formats and 
standards will proliferate and prevent agencies from being able to 
take full advantage of XML to substantially improve governmentwide 
data sharing. 

The XML Working Group was chartered by the CIO Council in September 
2000 to (1) identify pertinent standards and best practices, (2) 
establish partnerships with industry and public interest groups, (3) 
establish partnerships with governmental communities of interest, and 
(4) promote education and outreach. In addition, in its strategic plan 
for fiscal year 2001-2002, the CIO Council tasked the working group to 
use its Web sitexml.gov to lay out an evolving strategy with specific 
tasks for the working group to undertake to promote the effective and 
well-coordinated usage of XML to support governmental functions. 

The XML Working Group has undertaken a number of education and 
outreach efforts, including (1) holding monthly meetings as a forum 
for presentations and discussions about XML-related topics, (2) 
establishing the xml.gov Web site for information sharing and 
dissemination, and (3) exploring opportunities for coordination with 
state governments. 

As part of its effort to promote education and outreach, the working 
group holds monthly meetings to hear presentations and engage in 
discussions on XML-related topics. The meeting minutes, presentations, 
and information on other XML-related activities are shared and 
disseminated via the xml.gov Web site, as well as an electronic 
mailing list. In addition, agencies choosing to share information 
about their XML efforts can do so by registering with the working 
group, which then posts information about each effort on its Web site. 
To further promote their activities, working group officials met with 
state CIOs to explore opportunities to engage the states more 
effectively in the group's activities. 

In an effort to identify best practices for XML adoption, the CIO 
Council issued, in January 2001, a call for all federal CIOs to 
participate in developing and improving the design and content of the 
xml.gov Web site. In addition, the CIOs were encouraged to register 
their agencies' XML-related activities, especially those that cut 
across communities of interest. As of December 2001, representatives 
from 24 projects and working groups at the federal, state, and 
nonprofit levels had registered their XML-related efforts. However, 
according to the co-chair of the XML Working Group, there were likely 
many other federal activities under way that had not been registered. 
For example, the XML projects at Justice and SEC cited previously had 
not been registered at that time. 

On its Web site, the XML Working Group noted that in developing an 
evolving strategy for the effective usage of XML, it faced a number of 
constraints and conditions, including very limited resources and the 
fact that it is not a policy-making body and has no operational 
responsibilities. According to a statement at xml.gov, the Web site 
itself is intended to be the embodiment of the working group's 
strategic plan. Because of the working group's constraints, the Web 
site does not provide specific guidance to agencies for implementing 
XML, participating in XML standards bodies, or incorporating XML 
requirements into enterprise architectures. 

NIST, along with GSA, has developed a Web-based standards road map, to 
provide users with access to information regarding existing and 
emerging XML standards and activities related to electronic commerce. 
The standards road map allows users to identify standards information 
relevant to their individual projects and assess the applicability, 
maturity, and product availability associated with those activities. 
The tool can be accessed from the XML Working Group Web site or at 
[hyperlink, www.nist.gov/roadmap]. Although the standards road map has 
the potential to be a useful tool for promoting systems 
interoperability, it is still a work in progress because the standards 
are rapidly evolving. For example, technical specifications for UDDI 
are currently not in the standards road map. 

OMB officials told us that, as part of the annual budget preparation 
process, they have taken steps to encourage agencies to use XML 
consistently and share their development plans with other agencies. 
Specifically, according to the OMB officials, federal agencies that 
request funding for XML-based initiatives are instructed to (1) 
determine whether an implementation approach has already been 
developed in private industry that can be emulated to meet the 
agency's needs, and (2) submit their activities for listing on the 
xml.gov Web site so that other agencies can be made aware of their 
plans. Further, OMB officials said they discuss with agency officials 
the importance of updating sections of the agency's enterprise 
architecture—specifically the standards profile and technical 
reference model—to reflect their XML plans. As discussed previously, 
OMB has established a standard for success in the area of expanding e-
government by calling for agencies to minimize burden on business by 
reusing data previously collected or using ebXML or other open 
standards to receive transmissions. The agency has also begun using 
XML for its own databases of federal IT management information. 

Federal Government Needs Have Not Been Consolidated for Input to 
Standards-Setting Bodies: 

Several federal agencies are working individually with key industry 
and public interest groups to incorporate their unique requirements 
into standards and specifications as they are being developed. 
Specifically, officials from OMB, NIST, DISA, and GSA have each 
participated in one or more XML-related standards activities. However, 
no central focal point has been designated to identify cross-agency or 
governmentwide requirements for standard XML data structures or 
develop a dictionary of inherently governmental data tags. Further, no 
process has been implemented for consolidated collaboration with 
standards bodies on the development of XML standards and 
specifications to ensure that federal requirements are identified and 
incorporated. Past experience coordinating federal requirements for 
EDI suggests that one approach to resolving the problem would be to 
present a "single face to industry" through a single requirements 
coordinating committee. 

Based on individual agency initiative, several federal agencies are 
participating in standards initiatives led by organizations such as 
the American National Standards Institute (ANSI),[Footnote 18] 
UN/CEFACT, OASIS, and RosettaNet. For example, NIST is a member of 
OASIS and RosettaNet and has actively participated in the development 
of test suites to assess conformance with XML standards. NIST chairs 
several OASIS technical committees to influence the quality, 
correctness, and testability of ebXML specifications. In addition, 
NIST developed conformance test suites based on XML standards and 
submitted them to OASIS for the benefit of the entire community. 
Further, NIST co-sponsored a forum with ANSI in October 2001 to 
explore alternatives for using XML to improve ANSI's standards-setting 
process. GSA has also been active in standards setting by serving as a 
board member of the RosettaNet initiative. In addition, GSA officials, 
including the co-chair of the XML Working Group, have been actively 
participating in the development of ebXML standards at UN/CEFACT and 
OASIS. Also, OMB officials told us they were working with 
international organizations on trade-related standards. 

DISA participates in various standards bodies and consortiums, 
including ANSI, UN/CEFACT, OASIS, W3C, the Internet Engineering Task 
Force, and others. The agency has contributed to the development of 
the ebXML standards suite and has applied ebXML to its own electronic 
business processes. In addition, DISA is a member of the W3C Advisory 
Committee and coordinates with the Defense Logistics Agency in the 
development of W3C XML standards. 

Although these are valuable undertakings, none is specifically 
designed to serve the role of presenting unified federal requirements 
to standards bodies. The government's business processes are not 
necessarily the same as the private sector's, and in many cases 
government agencies may need to define unique data types and 
structures. The need for a defined set of inherently governmental data 
tags was highlighted in a recent study conducted by the Logistics 
Management Institute for GSA.[Footnote 19] The Institute was tasked to 
(1) identify the data elements associated with 22 commonly used 
government forms and (2) determine if those data elements were 
available in commercial registries. The study identified over 8,000 
data elements in the 22 specified forms. The study's final report 
stated that an intensive review of a subset of these elements found 
that for a very large number of them, no corresponding entry in any of 
the commercial registries was found. The Logistics Management 
Institute concluded that because existing commercial registries did 
not focus on many of the government's business processes, the 
government would need to develop new dictionaries of data tags, in 
concert with industry and the public, to meet its needs. 

Although similar needs for coordination have been successfully 
addressed in the past, the federal government does not have a process 
for providing consolidated input on XML to commercial standards 
bodies. Instead, OMB has allowed agencies to individually pursue 
participation in standards bodies to the extent that their interests 
and resources allow. As a result, participation has been limited and 
uncoordinated because it requires a commitment of staff resources that 
many agencies cannot afford, according to XML Working Group officials. 
OMB guidelines[Footnote 20] direct agencies to use voluntary consensus 
standards in lieu of government-unique standards, except where 
inconsistent with law or otherwise impractical. The guidelines also 
address agency participation in voluntary consensus standards bodies 
and describe procedures for satisfying the reporting requirements of 
the National Technology Transfer and Advancement Act of 1995 (Public 
Law 104-113). 

In the case of EDI, the federal government presented a "single face to 
industry" by chartering a Federal EDI Standards Management 
Coordinating Committee. The committee's objectives were to (1) adopt 
governmentwide EDI standards for implementation, (2) coordinate 
federal agency participation in EDI standards bodies to ensure 
adequate consideration of the government's business needs and to 
ensure consistency of position (thus presenting a "single face" to 
industry), and (3) share EDI information among agencies regarding 
current or planned implementations to avoid duplicate efforts and to 
streamline the process.[Footnote 21] As a result of the committee's 
work, a number of larger federal agencies are now successfully using 
EDI to conduct electronic business with established business partners.
XML Interoperability across the	Systems developers in the federal 
government would benefit from the establishment of an XML registry, 
which they could consult to identify and obtain predefined data 
elements and structures that are already in use. The XML Working Group 
is in the process of building a pilot version of such a registry. 
However, the registry will be effective in supporting systems 
interoperability among federal agencies only if governmentwide polices 
are set, guidelines established, and a defined management and funding 
process put in place for operating the registry. 

XML Interoperability across the Government Depends on an Effective 
Cross-Agency Registry: 
		
In contrast to the "top down" approach of defining and mandating the 
use of specific data structures or vocabularies, a "bottom up" 
approach is to establish a centralized registry of XML components—
including data elements, DTDs, and schemas—and coordinate its use by 
XML systems developers. Under this arrangement, XML developers would 
be encouraged to submit data elements and structures used in their 
systems for inclusion in the registry. Other developers would then be 
able to look up these structures in the registry and incorporate them, 
as appropriate, into their own systems. Developers would have the 
incentive to reuse data structures found in the registry because doing 
so would save costs and also bring about interoperability with other 
existing systems. The more widely specific data elements and 
structures were used, the closer they would come to becoming de facto 
standards. 

A centralized registry would not necessarily include only a single 
option to address a specific business need. Overlapping variants of 
some types of tags, definitions, and data structures may be needed to 
address the needs of different communities. For example, a standard 
schema for military purchase orders might differ from a purchase order 
schema shared by a group of civilian agencies. Further, a government 
registry could link to a number of standard commercial variants 
defined for other communities of interest that may contain additional 
purchase order schemas used by specific industries. The chemical and 
automotive industries, for example, may use schemas that vary from 
each other as well as from the standard government version. A registry 
would provide access and information about all relevant predefined 
data definitions and structures, which would allow developers to make 
decisions about the extent they needed to adhere strictly to industry 
standards, government standards, or some combination. Figure 7 
summarizes how an XML developer could hypothetically use an XML 
registry. 

Figure 7: Using a Registry of XML Data Elements and Structures: 

[Refer to PDF for image: illustration] 

1) Developer needs XML tags and/or schema for a specific purpose. 

2) Developer queries federal registry. 

3) Registry returns results based on data elements stored in
distributed repositories: 
Agency A repository; Agency B repository; Agency C repository; UDDI
registry; ebXML registry & repository. 

4) Developer either uses an element from the registry or builds a new 
one and registers it for others to use. 

Source: GAO. 

[End of figure] 

Although no registry of "inherently governmental" XML components has 
yet been established, work is under way to create a pilot version of a 
registry. According to XML Working Group officials, NIST has developed 
a specification of the functional requirements for the pilot registry, 
and the working group's leaders have determined that they can use a 
version of the system developed by the Defense Logistics Information 
Service to satisfy these requirements. No date has yet been set for 
putting the pilot registry into initial operation. 

According to the co-chair of the XML Working Group, a governmentwide 
registry can provide users with the ability to (1) discover and use 
pertinent XML components and (2) register additional components that 
are "inherently governmental" in nature if those already specified in 
commercial registries do not meet the users' requirements. With a 
registry in place, agencies could start using registered XML 
components, and de facto XML standards would thus begin to emerge 
within specific communities of interest. Under these circumstances, 
the CIO Council or OMB would be in a better position to define 
specific governmentwide standards at a later time, based in part on 
this activity. 

However, a government XML registry will be effective in supporting 
systems interoperability among federal agencies only if governmentwide 
policies are set, guidelines established, and a defined management and 
funding process put in place to operate the registry. Work on defining 
exactly how an operational governmentwide registry—and the data 
repositories associated with it—should be administered and maintained 
is not yet complete. The XML Working Group has recently established a 
subgroup to define registry-related policies and procedures. However, 
it has not yet defined a management process that specifies (1) who is 
allowed to register new XML components, (2) how input to the registry 
is to be verified, (3) to what extent developers will be required to 
consult the registry when building new XML data structures, (4) 
classes of compliance for categorizing how rigorously organizations 
adhere to the standard data structures and definitions, or (5) a 
configuration management process to keep track of successive versions 
of each registered component. Members of the group drafted an XML 
Developer's Guide in December 2001 that includes a proposed 
requirement that agency XML developers make use of the federal 
registry, but the draft guide has not yet been approved and adopted. 

Standard conventions for using XML's namespace feature and other rules 
for naming data elements, DTDs, and schemas in a consistent and 
unambiguous way have not yet been defined for the pilot registry. 
Without such a naming structure, different XML documents may use the 
same data tags for different definitions and structures. A standard 
use of the namespace feature would allow the tags in any given XML 
document to be traced back unambiguously to their proper definitions. 

The registry's management framework would also need to include 
definitions of different classes of compliance with the registry's 
data structures. In some cases, individual agency implementations may 
not need to be integrated with other government systems, and agencies 
may have compelling reasons to develop nonstandard data structures. 
The establishment of different classes of compliance would define how 
loosely or tightly an XML implementation would be connected to the 
registry and would outline the operational implications associated 
with each class. 

Once management policies and procedures are established, funding 
mechanisms will also be needed to support ongoing operation of the 
governmentwide registry. According to industry and XML Working Group 
officials, registry projects in the private sector to date have 
required significant commitments of resources. Thus it would be 
important to assess and plan for the expected costs of such an 
undertaking. 

XML Implementations Can Be More Effective within the Context of an 
Enterprise Architecture: 

Planning the effective use of a standard such as XML to promote data 
interoperability is part of the larger process of establishing and 
implementing an enterprise architecture.[Footnote 22] According to the 
CIO Council,[Footnote 23] an enterprise architecture establishes an 
agencywide roadmap to achieve an agency's mission through optimal 
performance of its core business processes within an efficient IT 
environment. Data, as a corporate asset, are key to an agency's 
vision, mission, goals, and daily work routine. The more efficiently 
an agency gathers, stores, uses, and protects data, the more 
productive it is. Thus, one of the major goals in developing an 
architecture is to minimize the burden of data collection, streamline 
data storage, and enhance data access. Planning XML usage within the 
context of an agency's enterprise architecture can contribute 
significantly to achieving this objective. 

A major component of an enterprise architecture is a standards 
profile, which defines the set of rules that governs systems 
implementation and operation. If agencies have a business need for 
XML, then the standards profile should be used to document the way in 
which XML standards and products will be used. 

Without an effort to build an enterprise architecture, including the 
underlying data architecture, implementing XML is likely to provide 
only a patchwork solution to systems interoperability. Typically, if 
multiple systems have been developed independently and without an 
overall architecture, they are likely to use many data element 
definitions and structures that overlap in function or are completely 
redundant. In addition, secondary or tertiary data elements—data that 
do not represent discrete information but are merely the calculated 
derivatives of primary data elements—are also likely to proliferate. 
If XML is simply added on to "glue" these systems together, the 
organization will have to carry the burden of maintaining many more 
data elements and definitions than are necessary, as well as all the 
translations needed to effectively pass data among the systems. 

We have recommended that an organization's data needs be assessed as a 
whole and an architecture defined that includes a core set of critical 
data elements and structures. Redundant elements, as well as secondary 
and tertiary elements, can then be eliminated, saving the organization 
the expense of maintaining them. XML can then be implemented more 
efficiently, with fewer translations required between elements that 
have different names but refer to the same thing. The organization 
will also be better prepared to define interfaces to external systems 
and data sources. According to a National Electronic Commerce 
Coordinating Council report,[Footnote 24] applying XML within 
government can yield greater benefits if agencies take the initial 
step of inventorying common data exchanges. 

As with any element of an IT infrastructure, security issues 'need to 
be identified and addressed when XML is being implemented. As 
previously discussed, XML documents potentially could be used to 
transport malicious code—such as viruses and worms—into an agency's 
computer systems, because virus checkers do not always examine the 
content of XML documents. System design documents will need to include 
plans to compensate for this and other potential vulnerabilities. 

[End of Chapter 3] 

Chapter 4: Conclusions and Recommendations: 

Conclusions: 

XML has the potential to help the federal government significantly 
streamline the process of identifying, integrating, and processing 
information from widely dispersed systems and organizations. Many 
critical government functions depend on effective information sharing 
across organizational boundaries, yet the problem of overcoming 
obstacles to effective data sharing has never been satisfactorily 
resolved. Today, broad information sharing needs are at the forefront 
of national priorities. For example, identifying and countering a 
bioterrorist attack requires that important medical information be 
collected and integrated as rapidly and thoroughly as possible. 
Likewise, law enforcement information about known terrorists and their 
activities must also be integrated and shared at Internet speed. XML-
based systems can play a valuable part in facilitating this kind of 
broad information exchange. 

XML's greatest benefits accrue when organizations, such as government 
agencies, use standard data exchange procedures and agree on standard 
data definitions and structures. Effectively using XML as a means to 
share data among disparate systems across the federal government will 
require agencies to conform to a range of technical and business 
standards. While XML's technical standards are largely in place, 
important business standards—including many planned standard 
vocabularies—have not yet been completed, and in some cases, standards 
development to date has resulted in incompatibilities. To the extent 
that these business standards address government needs as they are 
developed, government agencies will likely have less of a need to 
develop their own nonstandard data vocabularies and structures. 

Given that a complete set of XML-related standards is not yet 
available, system developers must be wary of several pitfalls 
associated with implementing XML that could limit its potential to 
facilitate broad information exchange or adversely affect 
interoperability, including (1) the risk that redundant data 
definitions, vocabularies, and structures will proliferate, (2) the 
potential for proprietary extensions to be built that would defeat 
XML's goal of broad interoperability, and (3) the need to maintain 
adequate security. 

While education and outreach are important activities that are already 
under way in the federal government, an explicit strategy for adopting 
XML across the government has not yet been defined. Such a strategy is 
an important foundation for promoting standardization across agencies 
and facilitating broad information exchange while at the same time 
reserving the flexibility for agencies to tailor their use of XML to 
best meet their needs. Without a well-defined strategy, the government 
runs the risk that incompatible data formats and standards will 
proliferate and prevent agencies from being able to take full 
advantage of XML to substantially improve governmentwide data sharing. 

The federal government, which is committed to adopting commercial 
standards wherever possible, still has the opportunity to have its 
needs considered in the process of developing these standards. 
However, federal requirements have not yet been identified and 
consolidated so that they can be clearly communicated to the standards 
bodies that are currently at work on XML business standards. 

Given that XML is still in the early stages of its development and 
implementation, a top down strategy of predefining XML data structures 
and designating specific commercial standards, such as ebXML, as 
universal solutions for addressing interoperability is not likely to 
be effective. Instead, to be effective, the government's strategy must 
balance top down guidance with bottom up incentives that encourage 
agency initiative and provide leeway for agencies to develop 
implementations that best meet their needs. Specifically, establishing 
an operational registry for XML data elements and structures with 
incentives for agencies to make use of it could encourage a bottom up 
development of de facto standards. As elements of a government XML 
vocabulary became standardized through this registry on a de facto 
basis, the government would be in a better position at a later date to 
revisit the question of what commercial standards and vocabularies to 
officially endorse. The XML Working Group is developing a pilot 
registry along these lines, but it is not yet operational and lacks an 
agreed-upon set of policies and guidelines to promote the broadest 
possible use. 

XML's larger promise of facilitating data exchange across broad 
domains (such as an entire agency, a group of agencies, or a set of 
external stakeholders and client organizations) will be difficult to 
realize until critical data elements and structures are identified and 
standardized across entire agencies and communities of interest. This 
task of identifying and standardizing critical data elements and 
structures is part of an agency's larger task of developing an 
enterprise architecture. Well-planned enterprise architectures can 
also promote the adoption of flexible implementations that can be 
modified in the future to conform to commercial standards that become 
established over time. Thus, agency enterprise architectures are key 
building blocks to effective governmentwide adoption of XML. 

Recommendations for Executive Action: 

Given the statutory responsibility of OMB to develop and oversee 
governmentwide policies and guidelines for agency IT management, we 
recommend that the director of OMB, working in concert with the 
federal CIO Council and NIST, develop a strategy for governmentwide 
adoption of XML to guide agency implementation efforts and ensure that 
the technology is addressed in agency enterprise architectures. This 
strategy should, at a minimum, address how the federal government will 
address the following tasks: 

* Developing a process with defined roles, responsibilities, and 
accountability for identifying and coordinating government-unique 
requirements and presenting consolidated, focused input to private 
sector standards-setting bodies during the development of XML 
standards. This process could be patterned after the current process 
that is in place for EDI coordination among federal agencies, or OMB 
might consider adapting the EDI process to cover XML as well. Guiding 
the overall process should be the presumption that mature, agreed-upon 
commercial standards will be adopted by the government whenever 
possible. 

* Developing a project plan for transitioning the CIO Council's pilot 
XML registry effort into an operational governmentwide resource. This 
plan should include identifying time frames and resources needed to 
implement and maintain an operational registry linked to agency 
repositories of standard data structures. 

* Setting policies and guidelines for managing and participating in 
the governmentwide XML registry, once it is operational, to ensure its 
effectiveness in promoting data sharing capabilities among federal 
agencies. These policies should clarify the roles and responsibilities 
of specific agencies and should consider including definitions of 
classes of compliance, which could be used to categorize how 
rigorously organizations adhere to the policies. Further, these 
policies should promote the consistent use of XML namespaces to 
resolve potential ambiguity in data references across XML documents. 

In addition, as part of its ongoing process for reviewing agency IT 
architectures and annual budget requests, we recommend that OMB ensure 
that agencies' business needs for XML technology are defined in their 
enterprise architectures. Specifically, OMB should specify 
requirements for documenting the usage of XML standards and products 
in the standards profile section of the architecture—the section that 
defines the set of rules governing systems implementation and 
operation. 

Agency Comments and Our Evaluation: 

In oral comments on a draft of this report, officials from OMB's 
Office of Information and Regulatory Affairs, including the 
Information Policy and Technology Branch chief, generally agreed with 
our findings and conclusions and stated that they would consider our 
recommendations. The officials also provided information on recent OMB 
actions aimed at promoting the adoption of XML by federal agencies. We 
have incorporated this updated information in the report. We view 
these recent OMB actions as positive steps. Nevertheless, we also 
believe that OMB can improve on these actions by implementing the 
recommendations in this report. 

We received oral comments from the co-chairmen of the XML Working 
Group; officials of NIST's Information Technology Laboratory; and the 
deputy associate administrator, Office of Electronic Commerce, GSA. We 
also received written comments from the chief information officer, 
National Aeronautics and Space Administration; and the director for 
policy and communications staff, National Archives and Records 
Administration. Letters from these latter two agencies are reprinted 
in appendixes I and II. All of the agency officials who reviewed the 
draft agreed with the overall content of the report. Officials from 
the XML Working Group and the National Archives and Records 
Administration expressed concern that the draft overemphasized the 
value of a "top down" XML implementation strategy that emphasizes 
executive direction and guidance as opposed to a "bottom up" approach 
relying on individual initiative at lower management levels. We 
believe that it is important to strike a balance between the two 
approaches. In response to this concern, we are including language in 
the final report to emphasize that a balance between the bottom up and 
top down approaches is needed. In addition, each agency provided 
technical comments, which have been addressed where appropriate in the 
final report. 

[End of Chapter 4] 

Appendix I: Comments from the National Aeronautics and Space 
Administration: 

National Aeronautics and Space Administration: 
Office of the Administrator: 
Washington, DC 20546-0001: 

March 18, 2002: 
	
Mr. John A. de Ferrari: 
Assistant Director: 
U.S. General Accounting Office: 
441 G Street, NW, Room 4T21: 
Washington, DC 20548: 

Dear Mr. De Ferrari: 

Thank you for the opportunity to comment on the draft GAO report, 
"Electronic Government: Challenges to Effective Adoption of the 
Extensible Markup Language." The report is quite comprehensive and 
effectively communicates the history, potential benefits, and 
challenges of adopting the Extensible Markup Language (XML). 

The draft report clearly demonstrates that XML, as contrasted with 
other emerging technologies, presents virtually unique challenges in 
that its effective use requires the convergence of both technical and 
business standards, and the business standards span virtually all 
segments of the private sector and government. In the case of most 
other technologies, the standards battles are usually fought at the 
technical level and are much less dependent on the vocabulary and 
business processes of potential industry and government users. In the 
case of XML, the World Wide Web Consortium (W3C) has worked out the 
technical standards, but each segment of the private sector is 
struggling through the process of developing its XML business 
standards. Since this process requires the cooperation of competitors, 
the final products are difficult to achieve and long in coming. In 
some areas key to the performance of the government, because the 
private sector is proceeding slowly or does not have requirements, the 
government, working cooperatively with the private sector, should take 
the lead in defining the government-unique business standards. 

To date, individual government departments and agencies (as documented 
in your draft report) have begun using XML based on a tradeoff of the 
benefits of its use in an incomplete business standards environment, 
versus the risk that their implementations will have to be redone to 
conform to business standards that are eventually finalized by the 
private sector segments with whom they interact. Given the current 
status of XML standards, this seems to be a rational approach. 
Therefore, while there is benefit to formalizing a government-wide 
strategy for adoption of XML along the lines described in the draft 
report, until XML business standards are much further along towards 
finalization, for the foreseeable future individual government 
entities will likely have to continue with the same risk assessment 
and trade-off approach in their implementations of XML. 

Implementing the elements of the XML strategy described in the draft 
report would help drive successful adoption of this technology across 
the government, but, to be effective, would require a significant 
commitment of new resources to groups such as the CIO Council XML 
Working Group, and should not be undertaken unless those resources are 
provided. 

Please contact Mr. Robert Benedict at (202) 358-1475 or at 
robert.benedict@hq.nasa.gov for questions on or clarification of these 
comments. 

Cordially yours, 

Signed by: 

Lee B. Holcomb: 
Chief Information Officer: 

[End of section] 

Appendix II: Comments from the National Archives and Records 
Administration: 

National Archives and Records Administration: 
8601 Adelphi Road: 
College Park, Maryland 20740-6001: 

March 14, 2002: 

John A. de Ferrari: 
Assistant Director: 
U.S. General Accounting Office: 
441 G Street, N.W. Room 4T21: 
Washington, D.C. 20548: 

Dear Mr. De Farrari: 

The National Archives and Records Administration (NARA) appreciates 
the opportunity to review and comment on the draft GAO report, 
"Electronic Government: Challenges to Effective Adoption of the 
Extensible Markup Language." We believe that the report accurately 
describes the present state of XML in the Federal Government. 

NARA strongly supports use of XML by the Federal Government. Indeed, 
our Electronic Records Archives (ERA) project will have XML as one of 
the building blocks to provide a dynamic solution that incorporates 
the expectation of continuing change in information technology and in 
the records it produces. We suggest that you include NARA's ERA 
project in your examples of agencies that are using XML in the section 
that begins on page 32. Beyond the ERA project, we suggest that GAO 
could also emphasize the use of XML in records management and 
recordkeeping for agencies. 

We appreciate GAO's recognition that there will be multiple Government 
XML registries/repositories. NARA will be working with the XML Working 
Group on the development of their pilot registry and will have a 
robust registry/repository as part of ERA. We may be the appropriate 
agency to host the cross-agency centralized registry. 

Finally, we strongly support the "bottom up" communities of interest 
approach to implementing XML in the Government. Although the draft 
report discusses this approach, it appears that you prefer the "top 
down" approach used to develop EDI. We believe that agencies should 
determine which approach better addresses their needs. 

Thank you for the opportunity to provide these comments. If you have 
any questions about our comments, please contact me. 

Sincerely, 

Signed by: 

Lori A. Lisowski: 
Director: 
Policy and Communications Staff: 

[End of section] 

Glossary: 

Application Programming Interface: 
The interface between the application software and the application 
platform (i.e., operating system), across which all services are 
provided. 

Attribute: 
A property associated with a specific data element in an XML document. 

Business Process: 
A collection of related, structured activities—a chain of events—that
produce a specific service or product for a particular customer or 
customers. 

Collaboration Protocol Agreement: 
Information that identifies or describes the specific collaboration 
protocol that two (or more) parties have agreed to use. 

Collaboration Protocol Profile: 
Information about a party that describes one or more business 
processes and associated protocols that the party supports for 
purposes of collaboration. 

Data Type: 
A description of the attributes of a specific set of data, such as 
whether it represents integers or text strings. 

Document Type Definition (DTD: 
A file that describes the structure of XML documents and defines how 
markup tags should be interpreted. A DTD can be used to automatically 
interpret multiple documents in a uniform way. 

Electronic Business: 
The exchange of information within or among enterprises by electronic
means for the purpose of conducting business transactions or other 
related activities. 

Electronic Commerce: 
Business done electronically, including the sharing of standardized
unstructured or structured business information by any electronic 
means. 

Electronic Data Interchange (EDI): 
The automated exchange of predefined and structured business data 
among information systems of two or more organizations. Federal 
government use of EDI is governed by Federal Information Processing 
Standard 161-2. 

Electronic Government: 
Government's use of technology, particularly Web-based applications, to
enhance the access to and delivery of government information and 
services to citizens, business partners, employees, other agencies, 
and government entities. 

Encryption: 
Cryptographic transformation of data (called "plaintext") into a form
(called "ciphertext") that conceals the data's original meaning to 
prevent it from being known or used. 

Enterprise Architecture: 
An institutional systems blueprint that defines in both business and
technology terms an organization's current and target operating 
environments and provides a road map for moving between the two. 

Extensible Markup Language (XML): 
A flexible, nonproprietary set of standards for tagging information so 
that it can be transmitted using Internet protocols and readily 
interpreted by disparate computer systems. 

Extensible Stylesheet Language (XSL): 
A language used to transform XML-based data into HTML or other 
presentation formats for display in a variety of media. 

Hypertext Markup Language (HTML): 
The standard markup language used to display information on the Web. 
It uses tags embedded in text files to encode instructions for 
formatting and displaying the information. 

Interoperability: 
The ability of two or more systems or components to exchange
information and to use the information that has been exchanged. 

Markup: 
The addition of tags or labels to data elements in a document to provide
processing instructions or to indicate structure or meaning. 

Metadata: 
Data containing descriptive information about other data. For example, a
block of numerical data might be identified in metadata as 
representing unit cost in dollars. 

Namespace: 
A unique identifier, such as a Web address, referenced at the start of 
an XML document as a source for definitions of the tags and other data 
structures used in the document. An XML document can reference more 
than one namespace. 

Parser: 
Software that reads an XML document and determines the structure and
properties of the data in the document. 

Registry: 
An electronic listing of specifications—such as DTDs, XML schemas, and
the metadata about them—as well as pointers to their locations (called 
repositories). 

Repository: 
A location or set of distributed locations where registry items reside 
and from which they can be retrieved and used in conjunction with 
marked up documents, such as XML documents. 

Schema: 
A set of custom tags and attributes that defines the permissible tagging
structure for an XML document and conforms to the W3C Schema 
specification. 

Search Engine: 
A program that searches documents for specified keywords and returns a
list of the documents where the keywords are found. 

Style Sheet: 
A text file that provides instructions for formatting and displaying 
the information in XML documents. Style sheets can include variations 
depending on the type of device used to access the document. For 
example, the same XML document could be displayed differently on a 
handheld wireless computer or a desktop computer, based on different 
style sheets. 

Valid XML Document: 
An XML document that has an associated document type declaration and
that complies with the specifications expressed in it. 

Well-formed XML Document: An XML document that conforms to the W3C XML 
specification. 

XML Document: 
A text document marked up with hierarchically arranged descriptive tags
and attributes. An XML document can also begin with declarations that 
refer to other files providing further instructions for interpreting 
and displaying data elements. 

XML Path Language (XPath): 
A language for referencing specific parts of an XML document. 

XML Processor: 
A software module used to read XML documents and give applications
access to their content and structure. Validating processors also 
identify discrepancies with the XML 1.0 standard and the constraints 
expressed in DTDs and external entities referenced in an XML document. 

XSL Transformation (XSLT): 
An extension to the XSL standard that provides commands to transform 
one XML document into either another XML document or a different 
format, such as HTML. 

[End of section] 

Footnotes: 

[1] Tagging is accomplished by labeling each element of a data set to 
clarify what kind of information is being provided. For example, "1600 
Pennsylvania Avenue" could be tagged to show that it refers to an 
address. In XML, the result would be <Address> 1600 Pennsylvania 
Avenue </Address>. 

[2] Interoperability is the ability of two or more systems or 
components to exchange information and to use the information that has 
been exchanged. 

[3] The W3C was founded in 1994 by Tim Berners-Lee, the inventor of 
the Web, to lead development of common protocols that promote the 
evolution of the Web and ensure interoperability. 

[4] Metadata are data containing descriptive information about other 
data. For example, a block of numerical data might be identified in 
metadata as representing unit cost in dollars. 

[5] Standard Generalized Markup Language, ISO 8879:1986. 

[6] EDI is the automated exchange of predefined and structured 
business data among information systems of two or more organizations. 

[7] Giga Information Group, XML's Role in the EDI World (June 23, 
2000). 

[8] Logistics Management Institute, Open Buying on the Internet and 
Extensible Markup Language: Recommendations on Adoption by the Federal 
Government (January 2000). 

[9] Widely used Internet protocols include Simple Mail Transfer 
Protocol (SMTP) for electronic mail, Hypertext Transfer Protocol 
(HTTP) for the World Wide Web, File Transfer Protocol (FTP) for file 
transfer, and others. 

[10] Giga Information Group, Giga Survey: XML Achieving Mainstream 
Usage (April 30, 2001). 

[11] The OpenTravel Alliance is a self-funded, nonprofit organization 
working to create and implement industrywide, open electronic business 
specifications. Membership in the alliance includes major airlines, 
hoteliers, car rental companies, travel agencies, and other interested 
parties. 

[12] Lessons learned report of the XML subgroup of the Global Advisory 
Committee Infrastructure/Standards Working Group, Department of 
Justice, October 2001. 

[13] In the terminology used by the W3C, a standard is finalized when 
it is formally approved as a "recommendation." Earlier versions are 
termed working drafts, candidate recommendations, and proposed 
recommendations. 

[14] UN/CEFACT is the United Nations' Center for the Facilitation of 
Procedures and Practices for Administration, Commerce, and Transport. 
OASIS is the Organization for the Advancement of Structured 
Information Standards. OASIS is an international nonprofit consortium 
that promotes open, collaborative development of interoperability
specifications to advance electronic business. 

[15] Office of Management and Budget, Memorandum M-02-02, 
Implementation of the President's Management Agenda and Presentation 
of the FY 2003 Budget Request (October 30, 2001). 

[16] In October 2001, OASIS formed the OASIS Universal Business 
Language (UBL) Technical Committee to define a common XML business 
document library. 

[17] Intellor Group, Inc., XML Adoption: Benefits and Challenges 
(2001). 

[18] ANSI is a private, nonprofit organization that administers and 
coordinates the U.S. voluntary standardization and conformity 
assessment system. 

[19] Mark Crawford, Donald F. Egan, and Angela Jackson, Federal Tag 
Standards for Extensible Markup Language, Logistics Management 
Institute (June 2001). 

[20] Office of Management and Budget, Circular A-119, Federal 
Participation in the Development and Use of Voluntary Consensus 
Standards and in Conformity Assessment Activities (February 10, 1998). 

[21] This process is described in FIPS Publication 161-2, Electronic 
Data Interchange (EDI). 

[22] An enterprise architecture is an institutional systems blueprint 
that defines in both business and technology terms the organization's 
current and target operating environments and provides a road map for 
moving between the two. 

[23] Chief Information Officers Council, A Practical Guide to Federal 
Enterprise Architecture, Version 1.0 (February 2001). 

[24] National Electronic Commerce Coordinating Council, An 
Introduction to XML's Potential Use within Government (December 2000). 

[End of section] 

GAO’s Mission: 

The General Accounting Office, the investigative arm of Congress, 
exists to support Congress in meeting its constitutional 
responsibilities and to help improve the performance and accountability 
of the federal government for the American people. GAO examines the use 
of public funds; evaluates federal programs and policies; and provides 
analyses, recommendations, and other assistance to help Congress make 
informed oversight, policy, and funding decisions. GAO’s commitment to 
good government is reflected in its core values of accountability, 
integrity, and reliability. 

Obtaining Copies of GAO Reports and Testimony: 

The fastest and easiest way to obtain copies of GAO documents at no 
cost is through the Internet. GAO’s Web site [hyperlink, 
http://www.gao.gov] contains abstracts and fulltext files of current 
reports and testimony and an expanding archive of older products. The 
Web site features a search engine to help you locate documents using 
key words and phrases. You can print these documents in their entirety, 
including charts and other graphics. 

Each day, GAO issues a list of newly released reports, testimony, and 
correspondence. GAO posts this list, known as “Today’s Reports,” on its 
Web site daily. The list contains links to the full-text document 
files. To have GAO e-mail this list to you every afternoon, go to 
[hyperlink, http://www.gao.gov] and select “Subscribe to daily E-mail 
alert for newly released products” under the GAO Reports heading. 

Order by Mail or Phone: 

The first copy of each printed report is free. Additional copies are $2 
each. A check or money order should be made out to the Superintendent 
of Documents. GAO also accepts VISA and Mastercard. Orders for 100 or 
more copies mailed to a single address are discounted 25 percent. 
Orders should be sent to: 

U.S. General Accounting Office: 
441 G Street NW, Room LM: 
Washington, D.C. 20548: 

To order by Phone: 
Voice: (202) 512-6000: 
TDD: (202) 512-2537: 
Fax: (202) 512-6061: 

To Report Fraud, Waste, and Abuse in Federal Programs Contact:
Web site: [hyperlink, http://www.gao.gov/fraudnet/fraudnet.htm]: 
E-mail: fraudnet@gao.gov: 
Automated answering system: (800) 424-5454 or (202) 512-7470: 

Public Affairs: 

Jeff Nelligan, managing director, NelliganJ@gao.gov: 
(202) 512-4800: 
U.S. General Accounting Office: 
441 G Street NW, Room 7149:
Washington, D.C. 20548: