GAO-03-273G, Assessing the Reliability of Computer-Processed Data (Superseded by GAO-09-680G)


This is the accessible text file for GAO report number GAO-03-187 
entitled 'Assessing the Reliability of Computer-Processed Data' which 
was released on October 1, 2002. 

This text file was formatted by the U.S. Government Accountability 
Office (GAO) to be accessible to users with visual impairments, as part 
of a longer term project to improve GAO products' accessibility. Every 
attempt has been made to maintain the structural and data integrity of 
the original printed product. Accessibility features, such as text 
descriptions of tables, consecutively numbered footnotes placed at the 
end of the file, and the text of agency comment letters, are provided 
but may not exactly duplicate the presentation or format of the printed 
version. The portable document format (PDF) file is an exact electronic 
replica of the printed version. We welcome your feedback. Please E-mail 
your comments regarding the contents or accessibility features of this 
document to Webmaster@gao.gov. 

This is a work of the U.S. government and is not subject to copyright 
protection in the United States. It may be reproduced and distributed 
in its entirety without further permission from GAO. Because this work 
may contain copyrighted images or other material, permission from the 
copyright holder may be necessary if you wish to reproduce this 
material separately. 

Note: In July 2009, this product was superseded by GAO-09-680G, Applied 
Research and Methods: Assessing the Reliability of Computer-Processed 
Data, available at [hyperlink, http://www.gao.gov/products/GAO-09-680G].

United States General Accounting Office: 
GAO: 

Applied Research and Methods: 

October 2002: 
External Version I: 

Assessing the Reliability of Computer-Processed Data: 

GAO-03-273G: 

Contents: 

Preface: 
Section 1: Introduction: 

Section 2: Understanding Data Reliability: 

Section 3: Deciding If a Data Reliability Assessment Is Necessary: 

Conditions Requiring a Data Reliability Assessment: 

Conditions Not Requiring a Data Reliability Assessment: 

Section 4: Performing a Data Reliability Assessment: 

Timing the Assessment: 

Documenting the Assessment: 

Section 5: Viewing the Entire Assessment Process: 

Section 6: Taking the First Steps: 

Reviewing Existing Information: 

Performing Initial Testing: 

Dealing with Short Time Frames: 

Section 7: Making the Preliminary Assessment: 

Factors to Consider in the Assessment: 

Outcomes to Consider in the Assessment: 

Section 8: Conducting Additional Work: 

Tracing to and from Source Documents: 

Using Advanced Electronic Testing: 

Reviewing Selected System Controls: 

Using Data of Undetermined Reliability: 

Section 9: Making the Final Assessment: 

Sufficiently Reliable Data: 

Not Sufficiently Reliable Data: 

Data of Undetermined Reliability: 

Section 10: Including Appropriate Language in the Report: 

Sufficiently Reliable Data: 

Not Sufficiently Reliable Data: 

Data of Undetermined Reliability: 

Glossary of Technical Terms: 

Figures: 

Figure 1: Factors to Consider in Making the Decision on Using the Data: 

Figure 2: Decision Process for Determining If a Data Reliability 
Assessment Is Required: 

Figure 3: Data Reliability Assessment Process: 

Figure 4: The First Steps of the Assessment: 

Figure 5: The Preliminary Assessment: 

Figure 6: Choosing and Conducting Additional Work: 

Figure 7: Making the Final Assessment: 

[End of section] 

Preface: 

Computer-processed data, often from external sources, increasingly
underpin audit reports, including evaluations (performance audits) and
financial audits. Therefore, the reliability of such data has become 
more and more important. Historically, computer-processed data have been
treated as unique evidence. However, these data are simply one form of
evidence relied on, although they may require more technical assessment
than other forms of evidence. In addition, the very nature of the
information system creating the data allows opportunities for errors to 
be introduced by many people. 

This guidance is intended to demystify the assessment of computer-
processed data. It supplements GAO’s “Yellow Book” (Government Auditing 
Standards, 1994 Revision), which defines the generally accepted 
government auditing standards (GAGAS), and replaces the earlier GAO
guidance, Assessing the Reliability of Computer-Processed Data
(GAO/OP-8.1.3, Sept. 1990). 

For all types of evidence, various tests are used—sufficiency, 
competence, and relevance—to assess whether the evidence standard is 
met. You probably have been using these tests for years and have become 
quite proficient at them. But because assessing computer-processed data
requires more technical tests, it may appear that such data are subject 
to a higher standard of testing than other evidence. That is not the 
case. For example, many of the same tests of sufficiency and relevance 
are applied to other types of evidence. But in assessing computer-
processed data, the focus is on one test in the evidence 
standard—competence—which includes validity and reliability. 
Reliability, in turn, includes the completeness and accuracy of the 
data. 

This guidance, therefore, provides a flexible, risk-based framework for 
data reliability assessments that can be geared to the specific 
circumstances of each engagement. The framework also provides a 
structure for planning and reporting, facilitates bringing the right 
mix of skills to each engagement, and ensures timely management buy-in 
on assessment strategies. The framework is built on: 

* making use of all existing information about the data; 

* performing at least a minimal level of data testing; 

* doing only the amount of work necessary to determine whether the data 
are reliable enough for our purposes; 

* maximizing professional judgment, and; 

* bringing the appropriate people, including management, to the table at
key decision points. 

The ultimate goal of the data reliability assessment is to determine 
whether you can use the data for your intended purposes. This guidance 
is designed to help you make an appropriate, defensible assessment in 
the most efficient manner. With any related questions, call Barbara 
Johnson, focal point for data reliability issues, at (202) 512-3663, or 
Barry Seltser, the Acting Director of GAO’s Center for Design, Methods, 
and Analysis, at (202) 512-3234. 

Signed by: 

Nancy Kingsbury: 
Managing Director, Applied Research and Methods: 

[End of section] 

Section 1: Introduction: 

This guidance explains what data reliability means and provides a
framework for assessing the reliability of computer-processed data. It
begins with the steps in a preliminary assessment, which, in many cases,
may be all you need to do to assess reliability. This guidance also 
helps you decide whether you should follow up the preliminary 
assessment with additional work. If so, it explains the steps in a 
final assessment and the actions to take, depending on the results of 
your additional work. The ultimate goal in determining data reliability 
is to make the following decision: For our engagement, can we use the 
data to answer the research question? See figure 1 for an overview of 
the factors that help to inform that decision. Not all of these factors 
may be necessary for all engagements. 

Figure 1: Factors to Consider in Making the Decision on Using the Data: 

[See PDF for image] 

This figure is an illustration of the factors to consider in making the 
decision on using the data. The following data is depicted: 

Use the Data of not: 
Factors: 
Significance of data in answering research questions; 
Results of preliminary assessment; 
Strength of corroborating evidence; 
Results of review of selected system controls; 
Results of advanced electronic testing; 
Results of tracking to or from source documents; 
Degree of risk. 

Source: GAO. 

[End of figure] 

In addition, this guidance discusses suggested language—appropriate
under different circumstances—for reporting the results of your
assessment. Finally, it provides detailed descriptions of all the 
stages of the assessment, as well as a glossary of technical terms used 
(see p. 33). An on-line version of this guidance, which will include 
tools that may help you in assessing reliability, is currently being 
developed. The overall process is illustrated in figures 2 (p. 7) and 3 
(p. 13). 

[End of section] 

Section 2: Understanding Data Reliability: 

Data reliability refers to the accuracy and completeness of computer-
processed data, given the intended purposes for use. Computer-processed 
data include data (1) entered into a computer system and (2) resulting 
from computer processing. Computer-processed data can vary in form—from 
electronic files to tables in published reports. The definition of 
computer-processed data is therefore broad. In this guidance, the term 
data always refers to computer-processed data. 

The “Yellow Book” requires that a data reliability assessment be 
performed for all data used as support for engagement findings, 
conclusions, or recommendations. [Footnote 1] This guidance will help 
you to design a data reliability assessment appropriate for the 
purposes of the engagement and then to evaluate the results of the 
assessment. 

Data are reliable when they are (1) complete (they contain all of the 
data elements and records needed for the engagement) [Footnote 2] and 
(2) accurate (they reflect the data entered at the source or, if 
available, in the source documents). A subcategory of accuracy is 
consistency. Consistency refers to the need to obtain and use data that 
are clear and well-defined enough to yield similar results in similar 
analyses. For example, if data are entered at multiple sites, 
inconsistent interpretation of data rules can lead to data that, taken 
as a whole, are unreliable. Reliability also means that for any 
computer processing of the data elements used, the results are 
reasonably complete and accurate, meet your intended purposes, and are 
not subject to inappropriate alteration. 

Assessments of reliability should be made in the broader context of the
particular characteristics of the engagement and the risk associated 
with the possibility of using data of insufficient reliability. 
Reliability does not mean that computer-processed data are error-free. 
Errors are considered acceptable under these circumstances: You have 
assessed the associated risk and found the errors are not significant 
enough to cause a reasonable person, aware of the errors, to doubt a 
finding, conclusion, or recommendation based on the data. 

While this guidance focuses only on the reliability of data in terms of
accuracy and completeness, other data quality considerations are just as
important. In particular, you should also consider the validity of data.
Validity (as used here) refers to whether the data actually represent 
what you think is being measured. For example, if a data field is named 
“annual evaluation score,” is this an appropriate measure of a person’s 
job performance? Considerations of data validity and reliability issues 
should be addressed early in the engagement, and appropriate technical
specialists—such as data analysts, statisticians, or information 
technology specialists—should be consulted. 

[End of section] 

Section 3: Deciding If a Data Reliability Assessment Is Necessary: 

To decide if a data reliability assessment is necessary, you should 
consider certain conditions. The engagement type and planned use of the 
data help to determine when you should assess data reliability. See 
figure 2 for an illustration of the decision process that you should 
use. 

Figure 2: Decision Process for Determining If a Data Reliability 
Assessment Is Required: 

[See PDF for image] 

This figure is a wireframe of the decision process for determining if a 
data reliability assessment is required. The following data is 
depicted: 

What is the type of engagement: 
Financial or financial-related audit; 
Use guidance in FAM or FISCAM. 

What is the type of engagement: 
All other engagements. 

Do you anticipate that the data will be significant to findings, 
conclusions, or recommendations? 
If yes: Does the research question require a determination of the 
reliability of an information system? 
If no: Note: Primarily background information: 
* Determine if best available source; 
* Disclose the source and that no reliability assessment was performed. 

Does the research question require a determination of the reliability 
of an information system? 
In yes: Conduct a computer system review and disclose in OSM the work 
done, results, and any limitations found; 
If no: Will the data be used on multiple future engagements? 

Will the data be used on multiple future engagements? 
If yes: Should you do a computer system review? 
If no: Continue with a data reliability assessment. 

Should you do a computer system review? 
If yes: Conduct a computer system review and disclose in OSM the work 
done, results, and any limitations found; 
If not at this time: Continue with a data reliability assessment. 

Source: GAO. 

[End of figure] 

Conditions Requiring a Data Reliability Assessment: 

You should assess reliability if the data to be analyzed are intended to
support the engagement findings, conclusions, or recommendations. Keep
in mind that a finding may include only a description of the condition, 
as in a purely descriptive report. In the audit plan for the 
engagement, you should include a brief discussion of how you plan to 
assess data reliability, as well as any limitations that may exist due 
to shortcomings in the data. 

Conditions Not Requiring a Data Reliability Assessment: 

You do not need to assess reliability if the data are used (1) only as
background information or (2) in documents without findings, 
conclusions, or recommendations. Background information generally sets 
the stage for reporting the results of an engagement or provides 
information that puts the results in proper context. Such information 
could be the size of the program or activity you are reviewing, for 
example. When you gather background or other data, ensure that they are 
from the best available source(s). When you present the data, cite the 
source(s) and state that the data were not assessed. 

Sometimes, as a best practice, however, you may want to do some
assessment of background data. Your judgment of the data’s importance
and the reliability of the source, as well as other engagement factors, 
can help you determine the extent of such an assessment. 

Finally, for financial audits and information system reviews, you 
should not follow this guidance in assessing data reliability. For 
financial audits, which include financial statement and financial-
related audits, you should follow the GAO/PCIE Financial Audit Manual 
(FAM) and the Federal Information System Controls Audit Manual 
(FISCAM). In an information system review, all controls in a computer 
system, for the full range of application functions and products, are 
assessed and tested. Such a review includes (1) examining the general 
and application controls of a computer system, [Footnote 3] (2) testing 
whether those controls are being complied with, and (3) testing data 
produced by the system. [Footnote 4] To design such a review, 
appropriate to the research question, seek assistance from information 
technology specialists. 

[End of section] 

Section 4: Performing a Data Reliability Assessment: 

To perform a data reliability assessment, you need to decide on the
timing—when to perform the assessment—and how to document it. 

Timing the Assessment: 

A data reliability assessment should be performed as early as possible 
in the engagement process, preferably during the design phase. The audit
plan should reflect data reliability issues and any additional steps 
that still need to be performed to assess the reliability of critical 
data. The engagement team generally should not finalize the audit plan 
or issue a commitment letter until it has done initial testing and 
reviewed existing information about the data and the system that 
produces the data. In addition, the team should not commit to making 
conclusions or recommendations based on the data unless the team 
expects to be satisfied with the data reliability. 

Documenting the Assessment: 

All work performed as part of the data reliability assessment should be
documented and included in the engagement workpapers. This includes all
testing, information review, and interviews related to data 
reliability. In addition, decisions made during the assessment, 
including the final assessment of whether the data are sufficiently 
reliable for the purposes of the engagement, should be summarized and 
included with the workpapers. These workpapers should be (1) clear 
about what steps the team took and what conclusions they reached and 
(2) reviewed by staff with appropriate skills or, if needed, technical 
specialists. 

[End of section] 

Section 5: Viewing the Entire Assessment Process: 

The ultimate goal of the data reliability assessment is to determine 
whether you can use the data to answer the research question. The 
assessment should be performed only for those portions of the data that 
are relevant to the engagement. The extensiveness of the assessment is 
driven by: 

* the expected significance of the data to the final report; 

* the anticipated risk level of using the data, and; 

* the strength or weakness of any corroborating evidence. 

Therefore, the specific assessment process should take into account 
these factors along with what is learned during the initial stage of the
assessment. The process is likely to be different for each engagement.

The overall framework of the process for data reliability assessment is
shown in figure 3. The framework identifies several key stages in the
assessment, as well as actions and decisions expected as you move 
through the process. The framework allows you to identify the 
appropriate mix of assessment steps to fit the particular needs of your 
engagement. In most cases, all of the elements in figure 3 would not be 
necessary in completing the assessment. Specific actions for each stage 
are discussed in sections 6-10. 

Figure 3: Data Reliability Assessment Process: 

[See PDF for image] 

This figure is a flow chart of the Data Reliability Assessment Process, 
depicting the following information: 

Taking the First Steps: 
* What is known about the data and the system?
* Obtain electronic or hard copy data; 
* Action: Perform initial testing; 
* Action: Review existing information about the data and the system. 

Making the Preliminary Assessment: 
* What is the preliminary assessment of reliability? 
- Sufficiently reliable to answer research question: Use data and 
disclose any limitations; 
- Not sufficiently reliable to answer research question: Take optional
actions; 
- Undetermined. 

Conducting Additional Work: 
Consider these factors: 
* Combination of actions: Anticipated significance of the data in 
answering the research question; Strength of corroborating evidence; 
Degree of risk involved. 

What is most appropriate mix of additional work? 

Some options for additional work: 
* Combination of actions: Trace to or from source documents; Use 
advanced electronic testing; Review selected system controls. 

Making the Final Assessment: 
* What is the final assessment of reliability? 
- Sufficiently reliable to answer research question: Use data and 
disclose any limitations; 
- Not sufficiently reliable to answer research question: Take optional
actions. 

Source: GAO. 

[End of figure] 

[End of section] 

Section 6: Taking the First Steps: 

The data reliability process begins with two relatively simple steps. 
These steps provide the basis for making a preliminary assessment of 
data reliability: (1) a review of related information and (2) initial 
testing (see figure 4). In some situations, you may have an extremely 
short time frame for the engagement; this section also provides some 
advice for this situation. 

The time required to review related information and perform initial 
testing will vary, depending on the engagement and the amount of risk 
involved. As discussed in section 4, these steps should take place 
early in the engagement and include the team members, as well as 
appropriate technical staff. 

Figure 4: The First Steps of the Assessment: 

[See PDF for image] 

This figure is a flow chart of the First Steps of the Assessment, 
depicting the following information: 

What is known about the data and the system?
* Obtain electronic or hard copy data; 
* Action: Perform initial testing; 
* Action: Review existing information about the data and the system. 

Source: GAO. 

[End of figure] 

Reviewing Existing Information: 

The first step—a review of existing information—helps you to determine
what is already known about the data and the computer processing. The
related information you collect can indicate both the accuracy and
completeness of the entry and processing of the data, as well as how 
data integrity is maintained. This information can be in the form of 
reports, studies, or interviews with individuals who are knowledgeable 
about the data and the system. Sources for related information include 
GAO, the agency under review, and others. 

GAO: 

GAO GAO may already have related information in reports. Those from 
fiscal year 1995 to the present are available via GAO’s Internet site. 
This site also provides other useful information: for example, as part 
of the annual governmentwide consolidated financial audit, GAO’s 
Information Technology Team is involved with reporting on the 
effectiveness of controls for financial information systems at 24 major 
federal agencies. 

Agency under Review: 

Officials of the agency or entity under review are aware of evaluations 
of their computer data or systems and usually can direct you to both.
However, keep in mind that information from agency officials may be
biased. Consider asking appropriate technical specialists to help in
evaluating this information. Agency information includes Inspector 
General reports, Federal Managers’ Financial Integrity Act reports, 
Government Performance and Results Act (GPRA) plans and reports, 
Clinger-Cohen Act reports, and Chief Information Officer reports. (Some 
of this information can be found in agency homepages on the Web.) 

Others: 

Other organizations and users of the data may be sources of relevant
information. To help you identify these sources, you can use a variety 
of databases and other research tools, which include the Congressional
Research Service Public Policy Literature Abstracts and organizations' 
Web sites. 

Performing Initial Testing: 

The second step—initial testing—can be done by applying logical tests to
electronic data files or hard copy reports. For electronic data, you use
computer programs to test all entries of key data elements in the entire
data file. [Footnote 5] Keep in mind that you only test those data 
elements you plan to use for the engagement. You will find that testing 
with computer programs often takes less than a day, depending on the 
complexity of the file. For hard copy or summarized data—provided by 
the audited entity or retrieved from the Internet—you can ask for the 
electronic data file used to create the hard copy or summarized data. 
If you are unable to obtain electronic data, use the hard copy or 
summarized data and, to the extent possible, manually apply the tests 
to all instances of key data elements or, if the report or summary is 
voluminous, to a sample of them. 

Whether you have an electronic data file or a hard copy report or 
summary, you apply the same types of tests to the data. These can 
include testing for: 

* missing data, either entire records or values of key data elements; 
* the relationship of one data element to another; 
* values outside of a designated range; and; 
* dates outside valid time frames or in an illogical progression. 

Be sure to keep a log of your testing for inclusion in the engagement
workpapers. 

Dealing with Short Time Frames: 

In some instances, the engagement may have a time frame that is too 
short for a complete preliminary assessment, for example, a request for
testimony in 2 weeks. However, given that all engagements are a function
of time, as well as scope and resources, limitations in one require 
balancing the others. 

Despite a short time frame, you may have time to review existing
information and carry out testing of data that are critical for 
answering a research question, for example: You can question 
knowledgeable agency staff about data reliability or review existing 
GAO or Inspector General reports to quickly gather information about 
data reliability issues. In addition, electronic testing of critical 
data elements for obvious errors of completeness and accuracy can 
generally be done in a short period of time on all but the most 
complicated or immense files. From that review and testing, you will be 
able to make a more informed determination about whether the data are 
sufficiently reliable to use for the purposes of the engagement. (See 
sections 7 and 8 for the actions to take, depending on your 
determination.) 

[End of section] 

Section 7: Making the Preliminary Assessment: 

The preliminary assessment is the first decision point in the assessment
process, including the consideration of multiple factors, a 
determination of the sufficiency of the data reliability with what is 
known at this point, and a decision about whether further work is 
required. You will decide whether the data are sufficiently reliable 
for the purposes of the engagement, not sufficiently reliable, or as 
yet undetermined. Keep in mind that you are not attesting to the 
overall reliability of the data or database. You are only determining 
the reliability of the data as needed to support the findings, 
conclusions, or recommendations of the engagement. As you gather
information and make your judgments, consult appropriate technical
specialists for assistance. 

Factors to Consider in the Assessment: 

To make the preliminary assessment of the sufficiency of the data
reliability for the engagement, you should consider all factors related 
to aspects of the engagement, as well as assessment work performed to 
this point. As shown in figure 5, these factors include: 

* the expected significance of the data in the final report; 
* corroborating evidence; 
* level of risk, and; 
* the results of initial assessment work. 

Figure 5: The Preliminary Assessment: 

[See PDF for image] 

This figure is a flow chart of the preliminary assessment, depicting 
the following information: 

Factors: 
* Results of initial testing; 
* Results of review of existing information; 
* Anticipated significance of the data in answering the research 
question; 
* Strength of corroborating evidence; 
* Degree of risk involved. 

All factors assist in answering the question: 
What is the preliminary assessment of reliability? 
* Sufficiently reliable: Use data and disclose limitations, if any; 
* Not sufficiently reliable: Take optional actions; 
* Undetermined. 

Source: GAO. 

{End of figure] 

Expected Significance of the Data in the Final Report: 

In making the preliminary assessment, consider the data in the context 
of the final report: Will the engagement team depend on the data alone 
to answer a research question? Will the data be summarized or will 
detailed information be required? Is it important to have precise data, 
making magnitude of errors an issue? 

Corroborating Evidence: 

You should consider the extent to which corroborating evidence is 
likely to exist and will independently support your findings, 
conclusions, or recommendations. Corroborating evidence is independent 
evidence that supports information in the database. Such evidence, if 
available, can be found in the form of alternative databases or expert 
views. It is unique to each engagement, and its 
strength—persuasiveness—varies. 

For help in deciding the strength or weakness of corroborating evidence,
consider the extent to which the corroborating evidence: 

* is consistent with the "Yellow Book" standards of evidence— 
sufficiency, competence, and relevance; 
* provides crucial support; 
* is drawn from different types of sources—testimonial, documentary,
physical, or analytical; and; 
* is independent of other sources. 

Level of Risk: 

Risk is the likelihood that using data of questionable reliability 
could have significant negative consequences on the decisions of 
policymakers and others. To do a risk assessment, consider the 
following risk conditions: 

* The data could be used to influence legislation, policy, or a program 
that could have significant impact. 

* The data could be used for significant decisions by individuals or
organizations with an interest in the subject. 

* The data will be the basis for numbers that are likely to be widely
quoted, for example, "In 1999, the United States owed the United 
Nations about $1.3 billion for the regular and peacekeeping budgets." 

* The engagement is concerned with a sensitive or controversial 
subject. 

* The engagement has external stakeholders who have taken positions on 
the subject. 

* The overall engagement risk is medium or high. 

* The engagement has unique factors that strongly increase risk. 

Bear in mind that any one of the conditions may have more importance
than another, depending on the engagement. 

Results of Initial Assessment Work: 

At this point, as shown in figure 5 (p. 19), the team will already have
performed the initial stage of the data reliability assessment. They 
should have the results from the (1) review of all available existing 
information about the data and the system that produced them and (2) 
initial testing of the critical data elements. These results should be 
appropriately documented and reviewed before the team enters into the 
decision-making phase of the preliminary assessment. Because the 
results will, in whole or in part, provide the evidence that the data 
are sufficiently reliable—and therefore competent enough—or not 
sufficiently reliable for the purposes of the engagement, the 
workpapers should include documentation of the process and results. 

Outcomes to Consider in the Assessment: 

The results of your combined judgments of the strength of corroborating
evidence and degree of risk suggest different assessments. If the
corroborating evidence is strong and the risk is low, the data are more
likely to be considered sufficiently reliable for your purposes. If the
corroborating evidence is weak and the risk is high, the data are more
likely to be considered not sufficiently reliable for your purposes. The
overall assessment is a judgment call, which should be made in the 
context of discussion with team management and technical specialists. 

The preliminary assessment categorizes the data as sufficiently 
reliable, not sufficiently reliable, or of undetermined reliability. 
Each category has implications for the next steps of the data 
reliability assessment. 

When to Assess Data as Sufficiently Reliable for Engagement Purposes: 

You can assess the data as sufficiently reliable for engagement purposes
when you conclude the following: Both the review of related information
and the initial testing provide assurance that (1) the likelihood of
significant errors or incompleteness is minimal and (2) the use of the 
data would not lead to an incorrect or unintentional message. You could 
have some problems or uncertainties about the data, but they would be 
minor, given the research question and intended use of the data. When 
the preliminary assessment indicates that the data are sufficiently 
reliable, use the data. 

When to Assess Data as Not Sufficiently Reliable for Engagement 
Purposes: 

You can assess the data as not sufficiently reliable for engagement
purposes when you conclude the following: The review of related
information or initial testing indicates that (1) significant errors or
incompleteness exist in some or all of the key data elements and (2) 
using the data would probably lead to an incorrect or unintentional 
message. 

When the preliminary assessment indicates that the data are not
sufficiently reliable, you should seek evidence from other sources,
including (1) alternative computerized data—the reliability of which you
should also assess—or (2) original data in the form of surveys, case 
studies, or expert interviews. 

You should coordinate with the requester if seeking evidence from other
sources does not result in a source of sufficiently reliable data. 
Inform the requester that such data, needed to respond to the request, 
are unavailable. Reach an agreement with the requester to: 

* redefine the research questions to eliminate the need to use the 
data; 
* end the engagement, or; 
* use the data with appropriate disclaimers. 

Remember that you—not the requester—are responsible for deciding what
data to use. If you decide you must use data that you have determined 
are not sufficiently reliable for the purposes of the engagement, make 
the limitations of the data clear, so that incorrect or unintentional 
conclusions will not be drawn. Finally, given that the data you 
assessed have serious reliability weaknesses, you should include this 
finding in the report and recommend that the agency take corrective 
action. 

When to Assess Data as of Undetermined Reliability and Consider 
Additional Work: 

You can assess the data as of undetermined reliability when you conclude
one of the following: 

* The review of some of the related information or initial testing 
raises questions about the data’s reliability. 

* The related information or initial testing provides too little 
information to judge reliability. 

* The time or resource constraints limit the extent of the examination 
of related information or initial testing. 

When the preliminary assessment indicates that the reliability of the 
data is undetermined, consider doing additional work to determine 
reliability. Section 8 provides guidance on the types of additional 
work to consider, as well as suggestions if no additional work is 
feasible. 

[End of section] 

Section 8: Conducting Additional Work: 

When you have determined (through the preliminary assessment) that the
data are of undetermined reliability, consider conducting additional 
work (see figure 6). A range of additional steps to further determine 
data reliability includes tracing to and from source documents, using 
advanced electronic testing, and reviewing selected system controls. 
The mix depends on what weaknesses you identified in the preliminary 
assessment and the circumstances specific to your engagement, such as 
risk level and corroborating evidence, as well as other factors. Focus 
particularly on those aspects of the data that pose the greatest 
potential risk for your engagement. You should get help from 
appropriate technical specialists to discuss whether additional work is 
required and to carry out any part of the additional reliability 
assessment. 

Figure 6: Choosing and Conducting Additional Work: 

[See PDF for image] 

This figure is a flow chart of the process of choosing and conducting 
additional work. The following information is depicted: 

What is most appropriate mix of additional work? 
Factors: 
* Results of initial testing; 
* Results of review of existing information; 
* Consider these factors: 
- Anticipated significance of the data in answering the research 
question; 
- Strength of corroborating evidence; 
- Degree of risk involved; 
* Some options for additional work: 
- Trace to or from source documents; 
- Use advanced electronic testing; 
- Review selected system controls. 

Source: GAO. 

[End of figure] 

Tracing to and from Source Documents: 

Tracing a sample of data records to source documents helps you to
determine whether the computer data accurately and completely reflect
these documents. In deciding what and how to trace, consider the 
relative risks to the engagement of overstating or understating the 
conclusions drawn from the data, for example: On the one hand, if you 
are particularly concerned that questionable cases might not have been 
entered into the computer system and that as a result, the degree of 
compliance may be overstated, you should consider tracing from source 
documents to the database. On the other hand, if you are more concerned 
that ineligible cases have been included in the database and that as a 
result, the potential problems may be understated, you should consider 
tracing from the database back to source documents. 

The reason to trace only a sample is because sampling saves time and 
cost. To be useful, however, the sample should be random and large 
enough to estimate the error rate within reasonable levels of 
precision. Tracing a random sample will provide the error rate and the 
magnitude of errors for the entire data file. It is this error rate 
that helps you to determine the data reliability. Generally, every data 
file will have some degree of error (see example 1 for error rate and 
example 2 for magnitude of errors). Consult statisticians to assist you 
in selecting the sampling method most suited to the engagement. 

Example 1: According to a random sample, 10 percent of the data records
have incorrect dates. However, the dates may be off by an average of 
only 3 days. Depending on what the data are used for, 3 days may not
compromise reliability. 

Example 2: The value of a data element was incorrectly entered as
$100,000, rather than $1,000,000. The documentation of the database
shows that the acceptable range for this data element is between $100 
and $5,000,000. Therefore, the electronic testing done in the initial 
testing phase would have confirmed that the value of $100,000 fell 
within that range. In this case, the error could be caught, not by 
electronic testing, but only by tracing the data to source documents. 

Tracing to Source Documents: 

Consider tracing to source documents when (1) the source documents are
available relatively easily or (2) the possible magnitude of errors is
especially critical. 

To trace a sample to source documents, match the entered data with the
corresponding data in the source documents. But in attempting to trace
entered data back to source documents, several problems can arise:
Source documents may not be available because they were destroyed, were
never created, or are not centrally located. 

Several options exist if source documents are not available. For those
documents never created—for example, when data may be based on
electronic submissions—use interviews to obtain related information, any
corroborating evidence obtained earlier, or a review of the adequacy of
system controls. 

Tracing from Source Documents: 

Consider tracing from source documents, instead of or in addition to
tracing a sample to source documents, when you have concerns that the
data are not complete. To trace a sample from source documents, match
the source documents with the entered data. Such tracing may be
appropriate to determine whether all data are completely entered.
However, if source documents were never created or are now missing, you
cannot identify the missing data. 

Using Advanced Electronic Testing: 

Advanced electronic testing goes beyond the basic electronic testing 
that you did in initial testing (see section 5). It generally requires 
specialized computer programs to test for specific conditions in the 
data. Such testing can be particularly helpful in determining the 
accuracy and completeness of processing by the application system that 
produced the data. Consider using advanced electronic testing for: 

* following up on troubling aspects of the data—such as extremely high
values associated with a certain geographic location—found in initial
testing or while analyzing the data; 

* testing relationships—cross-tabulation—between data elements, such
as whether data elements follow a skip pattern from a questionnaire;
and; 

* verifying that computer processing is accurate and complete, such as
testing a formula used in generating specific data elements. 

Depending on what will be tested, this testing can require a range of
programming skills—from creating cross-tabulations on related data
elements to duplicating an intricate automated process with more
advanced programming techniques. Consult appropriate technical
specialists, as needed. 

Reviewing Selected System Controls: 

Your review of selected system controls—the underlying structures and
processes of the computer in which the data are maintained—can provide
some assurance that the data are sufficiently reliable. Examples of 
system controls are limits on access to the system and edit checks on 
data entered into the system. Controls can reduce, to an acceptable 
level, the risk that a significant mistake could occur and remain 
undetected and uncorrected. Limit the review to evaluating the specific 
controls that can most directly affect the reliability of the data in 
question. Choose areas for review on the basis of what is known about 
the system. Sometimes, you identify potential system control problems 
in the initial steps of the assessment. Other times, you learn during 
the preliminary assessment that source documents are not readily 
available. Therefore, a review of selected system controls is the best 
method to determine if data were entered reliably. If needed, consult 
information system auditors for help in evaluating general and 
application controls. 

Using what you know about the system, concentrate on evaluating the
controls that most directly affect the data. These controls will usually
include (1) certain general controls, such as logical access and 
control of changes to the data, and (2) the application controls that 
help to ensure that the data are accurate and complete, as well as 
authorized. The steps for reviewing selected system controls are: 

* gain a detailed understanding of the system as it relates to the data 
and; 

* identify and assess the application and general controls that are 
critical to ensuring the reliability of the data required for the 
engagement. 

Using Data of Undetermined Reliability: 

In some situations, it may not be feasible to perform any additional 
work, for example, when (1) given a short time frame (too short for a 
complete assessment), (2) original computer files have been deleted, or 
(3) access to needed documents is unavailable. See section 9 for how to 
proceed. 

[End of section] 

Section 9: Making the Final Assessment: 

During the final assessment, you should consider the results of all your
previous work to determine whether, for your intended use, the data are
sufficiently reliable, not sufficiently reliable, or still 
undetermined. Again, remember that you are not attesting to the 
reliability of the data or database. You are only determining the 
sufficiency of the reliability of the data for your intended use. The 
final assessment will help you decide what actions to take (see figure 
7). 

Figure 7: Making the Final Assessment: 

[See PDF for image] 

This figure is a flow chart of the process for making the final 
assessment. The following information is depicted: 

Factors to consider: 
* Significance of the data in answering the research question; 
* Results of Initial testing; 
* Strength of corroborating evidence; 
* Results of review of existing information; 
* Degree of risk involved; 
* Results of any additional work; 

What is the final assessment of reliability? 
Sufficiently reliable: Use data and disclose limitations, if any; 
Not sufficiently reliable: Take optional actions. 

Source: GAO. 

[End of figure] 

The following are some considerations to help you decide whether you can
use the data: 

* The corroborating evidence is strong. 

* The degree of risk is low. 

* The results of additional assessment (1) answered issues raised in the
preliminary assessment and (2) did not raise any new questions. 

* The error rate, in tracing to or from source documents, did not
compromise reliability. 

In making this assessment, you should consult with appropriate technical
specialists. 

Sufficiently Reliable Data: 

You can consider the data sufficiently reliable when you conclude the
following: On the basis of the additional work, as well as the initial
assessment work, using the data would not weaken the analysis nor lead 
to an incorrect or unintentional message. You could have some problems 
or uncertainties about the data, but they would be minor, given the 
research question and intended use of the data. When your final 
assessment indicates that the data are reliable, use the data. 

Not Sufficiently Reliable Data: 

You can consider the data to be not sufficiently reliable when you 
conclude the following: On the basis of information drawn from the 
additional assessment, as well as the preliminary assessment, (1) using 
the data would most likely lead to an incorrect or unintentional 
message and (2) the data have significant or potentially significant 
limitations, given the research question and intended use of the data. 

When you determine that the data are not sufficiently reliable, you 
should inform the requester that sufficiently reliable data, needed to 
respond to the request, are unavailable. Remember that you—not the 
requester—are responsible for deciding what data to use. Although the 
requester may want information based on insufficiently reliable data, 
you are responsible for ensuring that data are used appropriately to 
respond to the requester. If you decide to use the data for the report, 
make the limitations of the data clear, so that incorrect or 
unintentional conclusions will not be arrived at. Appropriate team 
management should be consulted before you agree to use data that are 
not sufficiently reliable. 

Finally, given that the data you assessed have serious reliability
weaknesses, you should include this finding in the report and recommend
that the agency take corrective action. 

Data of Undetermined Reliability: 

You can consider the data to be of undetermined reliability when you
conclude the following: On the basis of the information drawn from any
additional work, as well as the preliminary assessment, (1) use of the 
data could lead to a incorrect or unintentional message and (2) the 
data have significant or potentially significant limitations, given the 
research question and the intended use. You can consider the data to be 
of undetermined reliability if specific factors—such as short time 
frames, the deletion of original computer files, and the lack of access 
to needed documents—are present. If you decide to use the data, make 
the limitations of the data clear, so that incorrect or unintentional 
conclusions will not be arrived at. 

As noted above in the case of not sufficiently reliable data, when you
determine that the data are of undetermined reliability, you should 
inform the requester—if appropriate—that sufficiently reliable data, 
needed to respond to the request, are unavailable. Remember that 
you—not the requester—are responsible for deciding what data to use. 
Although the requester may want information based on data of 
undetermined reliability, you are responsible for ensuring that 
appropriate data are used to respond to the requester. If you decide to 
use the data in your report, make the limitations clear, so that 
incorrect or unintentional conclusions will not be arrived at. 
Appropriate team management should be consulted before you agree to use 
data of undetermined reliability. 

[End of section] 

Section 10: Including Appropriate Language in the Report: 

In the report, you should include a statement in the methodology section
about conformance to generally accepted government auditing standards
(GAGAS). These standards refer to how you did your work, not how
reliable the data are. Therefore, you are conforming to GAGAS as long 
as, in reporting, you discuss what you did to assess the data; disclose 
any data concerns; and reach a judgment about the reliability of the 
data for use in the report. 

Furthermore, in the methodology section, include a discussion of your
assessment of data reliability and the basis for this assessment. The
language in this discussion will vary, depending on whether the data are
sufficiently reliable, not sufficiently reliable, or of undetermined 
reliability. In addition, you may need to discuss the reliability of 
the data in other sections of the report. Whether you do so depends on 
the importance of the data to the message. 

Sufficiently Reliable Data: 

Present your basis for assessing the data as sufficiently reliable, 
given the research questions and intended use of the data. This 
presentation includes (1) noting what kind of assessment you relied on, 
(2) explaining the steps in the assessment, and (3) disclosing any data 
limitations. Such disclosure includes: 

* telling why using the data would not lead to an incorrect or
unintentional message; 
* explaining how limitations could affect any expansion of the message,
and; 
* pointing out that any data limitations are minor in the context of the
engagement. 

Not Sufficiently Reliable Data: 

Present your basis for assessing the data as not sufficiently reliable, 
given the research questions and intended use of the data. This 
presentation should include what kind of assessment you relied on, with 
an explanation of the steps in the assessment. 

In this explanation, (1) describe the problems with the data, as well 
as why using the data would probably lead to an incorrect or 
unintentional message, and (2) state that the data problems are 
significant or potentially significant. In addition, if the report 
contains a conclusion or recommendation supported by evidence other 
than these data, state that fact. Finally, if the data you assessed are 
not sufficiently reliable, you should include this finding in the 
report and recommend that the audited entity take corrective action. 

Data of Undetermined Reliability: 

Present your basis for assessing the reliability of the data as 
undetermined. Include such factors as short time frames, the deletion 
of original computer files, and the lack of access to needed documents. 
Explain the reasonableness of using the data, for example: These are 
the only available data on the subject; the data are widely used by 
outside experts or policymakers; or the data are supported by credible 
corroborating evidence. In addition, make the limitations of the data 
clear, so that incorrect or unintentional conclusions will not be drawn 
from the data. For example, indicate how the use of these data could 
lead to an incorrect or unintentional message. Finally, if the report 
contains a conclusion or recommendation supported by evidence other 
than these data, state that fact. 

[End of section] 

Glossary of Technical Terms: 

accuracy: Freedom from error in the data. 

completeness: The inclusion of all necessary parts or elements. 

database: A collection of related data files (for example, questionnaire
responses from several different groups of people, with each group’s
identity maintained.) 

data element: An individual piece of information that has definable
parameters, sometimes referred to as variables or fields (for example, 
the response to any question in a questionnaire). 

data file: A collection of related data records, also referred to as a 
data set (for example, the collected questionnaire responses from a 
group of people). 

data record: A collection of related data elements that relate to a 
specific event, transaction, or occurrence (for example, questionnaire 
responses about one individual—such as age, sex, and marital status).
source document. Information that is the basis for entry of data into a
computer. 

[End of section] 

Footnotes: 

[1] U.S. General Accounting Office, Government Auditing Standards, 
GAO/OGC-94-4 (Washington, D.C.: June 1994), pp. 62-87. 

[2] A data element is a unit of information with definable parameters 
(for example, a Social Security number), sometimes referred to as a 
data variable or data field. 

[3] General controls refers to the structure, policies, and procedures— 
which apply to all or a large segment of an organization’s information 
systems—that help to ensure proper operation, data integrity, and 
security. Application controls refers to the structure, policies, and 
procedures that apply to individual application systems, such as 
inventory or payroll. 

[4] Guidance for carrying out reviews of general and application 
controls is provided in the U.S. General Accounting Office, Federal 
Information System Controls Audit Manual, GAO/AIMD-12.19.6 (Washington, 
D.C.: Jan. 1999). 

[5] Though an in-depth discussion of quality-assurance practices to be 
used in electronic testing and analyses is beyond the scope of this 
guidance, it is important to perform appropriate checks to ensure that 
you have obtained the correct file. All too often, analysts receive an 
incorrect file (an early version or an incomplete file). Appropriate 
steps would include counting records and comparing totals with the 
responsible agency or entity. 

[End of section] 

GAO’s Mission: 

The General Accounting Office, the investigative arm of Congress, 
exists to support Congress in meeting its constitutional 
responsibilities and to help improve the performance and accountability 
of the federal government for the American people. GAO examines the use 
of public funds; evaluates federal programs and policies; and provides 
analyses, recommendations, and other assistance to help Congress make 
informed oversight, policy, and funding decisions. GAO’s commitment to 
good government is reflected in its core values of accountability, 
integrity, and reliability. 

Obtaining Copies of GAO Reports and Testimony: 

The fastest and easiest way to obtain copies of GAO documents at no 
cost is through the Internet. GAO’s Web site [hyperlink, 
http://www.gao.gov] contains abstracts and full text files of current 
reports and testimony and an expanding archive of older products. The 
Web site features a search engine to help you locate documents using 
key words and phrases. You can print these documents in their entirety, 
including charts and other graphics. 

Each day, GAO issues a list of newly released reports, testimony, and 
correspondence. GAO posts this list, known as “Today’s Reports,” on its 
Web site daily. The list contains links to the full-text document 
files. To have GAO e-mail this list to you every afternoon, go to 
[hyperlink, http://www.gao.gov] and select “Subscribe to daily E-mail 
alert for newly released products” under the GAO Reports Order GAO 
Products heading. 

Order by Mail or Phone: 

The first copy of each printed report is free. Additional copies are $2 
each. A check or money order should be made out to the Superintendent 
of Documents. GAO also accepts VISA and Mastercard. Orders for 100 or 
more copies mailed to a single address are discounted 25 percent. 
Orders should be sent to: 

U.S. General Accounting Office: 
441 G Street NW, Room LM: 
Washington, D.C. 20548: 

To order by Phone: 
Voice: (202) 512-6000: 
TDD: (202) 512-2537
Fax: (202) 512-6061 

To Report Fraud, Waste, and Abuse in Federal Programs Contact:
Web site: [hyperlink, http://www.gao.gov/fraudnet/fraudnet.htm]: 
E-mail: fraudnet@gao.gov: 
Automated answering system: (800) 424-5454 or (202) 512-7470: 

Public Affairs: 

Jeff Nelligan, managing director, NelliganJ@gao.gov: 
(202) 512-4800: 
U.S. General Accounting Office: 
441 G Street NW, Room 7149: 
Washington, D.C. 20548: