This is the accessible text file for GAO report number GAO-03-187
entitled 'Assessing the Reliability of Computer-Processed Data' which
was released on October 1, 2002.
This text file was formatted by the U.S. Government Accountability
Office (GAO) to be accessible to users with visual impairments, as part
of a longer term project to improve GAO products' accessibility. Every
attempt has been made to maintain the structural and data integrity of
the original printed product. Accessibility features, such as text
descriptions of tables, consecutively numbered footnotes placed at the
end of the file, and the text of agency comment letters, are provided
but may not exactly duplicate the presentation or format of the printed
version. The portable document format (PDF) file is an exact electronic
replica of the printed version. We welcome your feedback. Please E-mail
your comments regarding the contents or accessibility features of this
document to Webmaster@gao.gov.
This is a work of the U.S. government and is not subject to copyright
protection in the United States. It may be reproduced and distributed
in its entirety without further permission from GAO. Because this work
may contain copyrighted images or other material, permission from the
copyright holder may be necessary if you wish to reproduce this
material separately.
Note: In July 2009, this product was superseded by GAO-09-680G, Applied
Research and Methods: Assessing the Reliability of Computer-Processed
Data, available at [hyperlink, http://www.gao.gov/products/GAO-09-680G].
United States General Accounting Office:
GAO:
Applied Research and Methods:
October 2002:
External Version I:
Assessing the Reliability of Computer-Processed Data:
GAO-03-273G:
Contents:
Preface:
Section 1: Introduction:
Section 2: Understanding Data Reliability:
Section 3: Deciding If a Data Reliability Assessment Is Necessary:
Conditions Requiring a Data Reliability Assessment:
Conditions Not Requiring a Data Reliability Assessment:
Section 4: Performing a Data Reliability Assessment:
Timing the Assessment:
Documenting the Assessment:
Section 5: Viewing the Entire Assessment Process:
Section 6: Taking the First Steps:
Reviewing Existing Information:
Performing Initial Testing:
Dealing with Short Time Frames:
Section 7: Making the Preliminary Assessment:
Factors to Consider in the Assessment:
Outcomes to Consider in the Assessment:
Section 8: Conducting Additional Work:
Tracing to and from Source Documents:
Using Advanced Electronic Testing:
Reviewing Selected System Controls:
Using Data of Undetermined Reliability:
Section 9: Making the Final Assessment:
Sufficiently Reliable Data:
Not Sufficiently Reliable Data:
Data of Undetermined Reliability:
Section 10: Including Appropriate Language in the Report:
Sufficiently Reliable Data:
Not Sufficiently Reliable Data:
Data of Undetermined Reliability:
Glossary of Technical Terms:
Figures:
Figure 1: Factors to Consider in Making the Decision on Using the Data:
Figure 2: Decision Process for Determining If a Data Reliability
Assessment Is Required:
Figure 3: Data Reliability Assessment Process:
Figure 4: The First Steps of the Assessment:
Figure 5: The Preliminary Assessment:
Figure 6: Choosing and Conducting Additional Work:
Figure 7: Making the Final Assessment:
[End of section]
Preface:
Computer-processed data, often from external sources, increasingly
underpin audit reports, including evaluations (performance audits) and
financial audits. Therefore, the reliability of such data has become
more and more important. Historically, computer-processed data have been
treated as unique evidence. However, these data are simply one form of
evidence relied on, although they may require more technical assessment
than other forms of evidence. In addition, the very nature of the
information system creating the data allows opportunities for errors to
be introduced by many people.
This guidance is intended to demystify the assessment of computer-
processed data. It supplements GAO’s “Yellow Book” (Government Auditing
Standards, 1994 Revision), which defines the generally accepted
government auditing standards (GAGAS), and replaces the earlier GAO
guidance, Assessing the Reliability of Computer-Processed Data
(GAO/OP-8.1.3, Sept. 1990).
For all types of evidence, various tests are used—sufficiency,
competence, and relevance—to assess whether the evidence standard is
met. You probably have been using these tests for years and have become
quite proficient at them. But because assessing computer-processed data
requires more technical tests, it may appear that such data are subject
to a higher standard of testing than other evidence. That is not the
case. For example, many of the same tests of sufficiency and relevance
are applied to other types of evidence. But in assessing computer-
processed data, the focus is on one test in the evidence
standard—competence—which includes validity and reliability.
Reliability, in turn, includes the completeness and accuracy of the
data.
This guidance, therefore, provides a flexible, risk-based framework for
data reliability assessments that can be geared to the specific
circumstances of each engagement. The framework also provides a
structure for planning and reporting, facilitates bringing the right
mix of skills to each engagement, and ensures timely management buy-in
on assessment strategies. The framework is built on:
* making use of all existing information about the data;
* performing at least a minimal level of data testing;
* doing only the amount of work necessary to determine whether the data
are reliable enough for our purposes;
* maximizing professional judgment, and;
* bringing the appropriate people, including management, to the table at
key decision points.
The ultimate goal of the data reliability assessment is to determine
whether you can use the data for your intended purposes. This guidance
is designed to help you make an appropriate, defensible assessment in
the most efficient manner. With any related questions, call Barbara
Johnson, focal point for data reliability issues, at (202) 512-3663, or
Barry Seltser, the Acting Director of GAO’s Center for Design, Methods,
and Analysis, at (202) 512-3234.
Signed by:
Nancy Kingsbury:
Managing Director, Applied Research and Methods:
[End of section]
Section 1: Introduction:
This guidance explains what data reliability means and provides a
framework for assessing the reliability of computer-processed data. It
begins with the steps in a preliminary assessment, which, in many cases,
may be all you need to do to assess reliability. This guidance also
helps you decide whether you should follow up the preliminary
assessment with additional work. If so, it explains the steps in a
final assessment and the actions to take, depending on the results of
your additional work. The ultimate goal in determining data reliability
is to make the following decision: For our engagement, can we use the
data to answer the research question? See figure 1 for an overview of
the factors that help to inform that decision. Not all of these factors
may be necessary for all engagements.
Figure 1: Factors to Consider in Making the Decision on Using the Data:
[See PDF for image]
This figure is an illustration of the factors to consider in making the
decision on using the data. The following data is depicted:
Use the Data of not:
Factors:
Significance of data in answering research questions;
Results of preliminary assessment;
Strength of corroborating evidence;
Results of review of selected system controls;
Results of advanced electronic testing;
Results of tracking to or from source documents;
Degree of risk.
Source: GAO.
[End of figure]
In addition, this guidance discusses suggested language—appropriate
under different circumstances—for reporting the results of your
assessment. Finally, it provides detailed descriptions of all the
stages of the assessment, as well as a glossary of technical terms used
(see p. 33). An on-line version of this guidance, which will include
tools that may help you in assessing reliability, is currently being
developed. The overall process is illustrated in figures 2 (p. 7) and 3
(p. 13).
[End of section]
Section 2: Understanding Data Reliability:
Data reliability refers to the accuracy and completeness of computer-
processed data, given the intended purposes for use. Computer-processed
data include data (1) entered into a computer system and (2) resulting
from computer processing. Computer-processed data can vary in form—from
electronic files to tables in published reports. The definition of
computer-processed data is therefore broad. In this guidance, the term
data always refers to computer-processed data.
The “Yellow Book” requires that a data reliability assessment be
performed for all data used as support for engagement findings,
conclusions, or recommendations. [Footnote 1] This guidance will help
you to design a data reliability assessment appropriate for the
purposes of the engagement and then to evaluate the results of the
assessment.
Data are reliable when they are (1) complete (they contain all of the
data elements and records needed for the engagement) [Footnote 2] and
(2) accurate (they reflect the data entered at the source or, if
available, in the source documents). A subcategory of accuracy is
consistency. Consistency refers to the need to obtain and use data that
are clear and well-defined enough to yield similar results in similar
analyses. For example, if data are entered at multiple sites,
inconsistent interpretation of data rules can lead to data that, taken
as a whole, are unreliable. Reliability also means that for any
computer processing of the data elements used, the results are
reasonably complete and accurate, meet your intended purposes, and are
not subject to inappropriate alteration.
Assessments of reliability should be made in the broader context of the
particular characteristics of the engagement and the risk associated
with the possibility of using data of insufficient reliability.
Reliability does not mean that computer-processed data are error-free.
Errors are considered acceptable under these circumstances: You have
assessed the associated risk and found the errors are not significant
enough to cause a reasonable person, aware of the errors, to doubt a
finding, conclusion, or recommendation based on the data.
While this guidance focuses only on the reliability of data in terms of
accuracy and completeness, other data quality considerations are just as
important. In particular, you should also consider the validity of data.
Validity (as used here) refers to whether the data actually represent
what you think is being measured. For example, if a data field is named
“annual evaluation score,” is this an appropriate measure of a person’s
job performance? Considerations of data validity and reliability issues
should be addressed early in the engagement, and appropriate technical
specialists—such as data analysts, statisticians, or information
technology specialists—should be consulted.
[End of section]
Section 3: Deciding If a Data Reliability Assessment Is Necessary:
To decide if a data reliability assessment is necessary, you should
consider certain conditions. The engagement type and planned use of the
data help to determine when you should assess data reliability. See
figure 2 for an illustration of the decision process that you should
use.
Figure 2: Decision Process for Determining If a Data Reliability
Assessment Is Required:
[See PDF for image]
This figure is a wireframe of the decision process for determining if a
data reliability assessment is required. The following data is
depicted:
What is the type of engagement:
Financial or financial-related audit;
Use guidance in FAM or FISCAM.
What is the type of engagement:
All other engagements.
Do you anticipate that the data will be significant to findings,
conclusions, or recommendations?
If yes: Does the research question require a determination of the
reliability of an information system?
If no: Note: Primarily background information:
* Determine if best available source;
* Disclose the source and that no reliability assessment was performed.
Does the research question require a determination of the reliability
of an information system?
In yes: Conduct a computer system review and disclose in OSM the work
done, results, and any limitations found;
If no: Will the data be used on multiple future engagements?
Will the data be used on multiple future engagements?
If yes: Should you do a computer system review?
If no: Continue with a data reliability assessment.
Should you do a computer system review?
If yes: Conduct a computer system review and disclose in OSM the work
done, results, and any limitations found;
If not at this time: Continue with a data reliability assessment.
Source: GAO.
[End of figure]
Conditions Requiring a Data Reliability Assessment:
You should assess reliability if the data to be analyzed are intended to
support the engagement findings, conclusions, or recommendations. Keep
in mind that a finding may include only a description of the condition,
as in a purely descriptive report. In the audit plan for the
engagement, you should include a brief discussion of how you plan to
assess data reliability, as well as any limitations that may exist due
to shortcomings in the data.
Conditions Not Requiring a Data Reliability Assessment:
You do not need to assess reliability if the data are used (1) only as
background information or (2) in documents without findings,
conclusions, or recommendations. Background information generally sets
the stage for reporting the results of an engagement or provides
information that puts the results in proper context. Such information
could be the size of the program or activity you are reviewing, for
example. When you gather background or other data, ensure that they are
from the best available source(s). When you present the data, cite the
source(s) and state that the data were not assessed.
Sometimes, as a best practice, however, you may want to do some
assessment of background data. Your judgment of the data’s importance
and the reliability of the source, as well as other engagement factors,
can help you determine the extent of such an assessment.
Finally, for financial audits and information system reviews, you
should not follow this guidance in assessing data reliability. For
financial audits, which include financial statement and financial-
related audits, you should follow the GAO/PCIE Financial Audit Manual
(FAM) and the Federal Information System Controls Audit Manual
(FISCAM). In an information system review, all controls in a computer
system, for the full range of application functions and products, are
assessed and tested. Such a review includes (1) examining the general
and application controls of a computer system, [Footnote 3] (2) testing
whether those controls are being complied with, and (3) testing data
produced by the system. [Footnote 4] To design such a review,
appropriate to the research question, seek assistance from information
technology specialists.
[End of section]
Section 4: Performing a Data Reliability Assessment:
To perform a data reliability assessment, you need to decide on the
timing—when to perform the assessment—and how to document it.
Timing the Assessment:
A data reliability assessment should be performed as early as possible
in the engagement process, preferably during the design phase. The audit
plan should reflect data reliability issues and any additional steps
that still need to be performed to assess the reliability of critical
data. The engagement team generally should not finalize the audit plan
or issue a commitment letter until it has done initial testing and
reviewed existing information about the data and the system that
produces the data. In addition, the team should not commit to making
conclusions or recommendations based on the data unless the team
expects to be satisfied with the data reliability.
Documenting the Assessment:
All work performed as part of the data reliability assessment should be
documented and included in the engagement workpapers. This includes all
testing, information review, and interviews related to data
reliability. In addition, decisions made during the assessment,
including the final assessment of whether the data are sufficiently
reliable for the purposes of the engagement, should be summarized and
included with the workpapers. These workpapers should be (1) clear
about what steps the team took and what conclusions they reached and
(2) reviewed by staff with appropriate skills or, if needed, technical
specialists.
[End of section]
Section 5: Viewing the Entire Assessment Process:
The ultimate goal of the data reliability assessment is to determine
whether you can use the data to answer the research question. The
assessment should be performed only for those portions of the data that
are relevant to the engagement. The extensiveness of the assessment is
driven by:
* the expected significance of the data to the final report;
* the anticipated risk level of using the data, and;
* the strength or weakness of any corroborating evidence.
Therefore, the specific assessment process should take into account
these factors along with what is learned during the initial stage of the
assessment. The process is likely to be different for each engagement.
The overall framework of the process for data reliability assessment is
shown in figure 3. The framework identifies several key stages in the
assessment, as well as actions and decisions expected as you move
through the process. The framework allows you to identify the
appropriate mix of assessment steps to fit the particular needs of your
engagement. In most cases, all of the elements in figure 3 would not be
necessary in completing the assessment. Specific actions for each stage
are discussed in sections 6-10.
Figure 3: Data Reliability Assessment Process:
[See PDF for image]
This figure is a flow chart of the Data Reliability Assessment Process,
depicting the following information:
Taking the First Steps:
* What is known about the data and the system?
* Obtain electronic or hard copy data;
* Action: Perform initial testing;
* Action: Review existing information about the data and the system.
Making the Preliminary Assessment:
* What is the preliminary assessment of reliability?
- Sufficiently reliable to answer research question: Use data and
disclose any limitations;
- Not sufficiently reliable to answer research question: Take optional
actions;
- Undetermined.
Conducting Additional Work:
Consider these factors:
* Combination of actions: Anticipated significance of the data in
answering the research question; Strength of corroborating evidence;
Degree of risk involved.
What is most appropriate mix of additional work?
Some options for additional work:
* Combination of actions: Trace to or from source documents; Use
advanced electronic testing; Review selected system controls.
Making the Final Assessment:
* What is the final assessment of reliability?
- Sufficiently reliable to answer research question: Use data and
disclose any limitations;
- Not sufficiently reliable to answer research question: Take optional
actions.
Source: GAO.
[End of figure]
[End of section]
Section 6: Taking the First Steps:
The data reliability process begins with two relatively simple steps.
These steps provide the basis for making a preliminary assessment of
data reliability: (1) a review of related information and (2) initial
testing (see figure 4). In some situations, you may have an extremely
short time frame for the engagement; this section also provides some
advice for this situation.
The time required to review related information and perform initial
testing will vary, depending on the engagement and the amount of risk
involved. As discussed in section 4, these steps should take place
early in the engagement and include the team members, as well as
appropriate technical staff.
Figure 4: The First Steps of the Assessment:
[See PDF for image]
This figure is a flow chart of the First Steps of the Assessment,
depicting the following information:
What is known about the data and the system?
* Obtain electronic or hard copy data;
* Action: Perform initial testing;
* Action: Review existing information about the data and the system.
Source: GAO.
[End of figure]
Reviewing Existing Information:
The first step—a review of existing information—helps you to determine
what is already known about the data and the computer processing. The
related information you collect can indicate both the accuracy and
completeness of the entry and processing of the data, as well as how
data integrity is maintained. This information can be in the form of
reports, studies, or interviews with individuals who are knowledgeable
about the data and the system. Sources for related information include
GAO, the agency under review, and others.
GAO:
GAO GAO may already have related information in reports. Those from
fiscal year 1995 to the present are available via GAO’s Internet site.
This site also provides other useful information: for example, as part
of the annual governmentwide consolidated financial audit, GAO’s
Information Technology Team is involved with reporting on the
effectiveness of controls for financial information systems at 24 major
federal agencies.
Agency under Review:
Officials of the agency or entity under review are aware of evaluations
of their computer data or systems and usually can direct you to both.
However, keep in mind that information from agency officials may be
biased. Consider asking appropriate technical specialists to help in
evaluating this information. Agency information includes Inspector
General reports, Federal Managers’ Financial Integrity Act reports,
Government Performance and Results Act (GPRA) plans and reports,
Clinger-Cohen Act reports, and Chief Information Officer reports. (Some
of this information can be found in agency homepages on the Web.)
Others:
Other organizations and users of the data may be sources of relevant
information. To help you identify these sources, you can use a variety
of databases and other research tools, which include the Congressional
Research Service Public Policy Literature Abstracts and organizations'
Web sites.
Performing Initial Testing:
The second step—initial testing—can be done by applying logical tests to
electronic data files or hard copy reports. For electronic data, you use
computer programs to test all entries of key data elements in the entire
data file. [Footnote 5] Keep in mind that you only test those data
elements you plan to use for the engagement. You will find that testing
with computer programs often takes less than a day, depending on the
complexity of the file. For hard copy or summarized data—provided by
the audited entity or retrieved from the Internet—you can ask for the
electronic data file used to create the hard copy or summarized data.
If you are unable to obtain electronic data, use the hard copy or
summarized data and, to the extent possible, manually apply the tests
to all instances of key data elements or, if the report or summary is
voluminous, to a sample of them.
Whether you have an electronic data file or a hard copy report or
summary, you apply the same types of tests to the data. These can
include testing for:
* missing data, either entire records or values of key data elements;
* the relationship of one data element to another;
* values outside of a designated range; and;
* dates outside valid time frames or in an illogical progression.
Be sure to keep a log of your testing for inclusion in the engagement
workpapers.
Dealing with Short Time Frames:
In some instances, the engagement may have a time frame that is too
short for a complete preliminary assessment, for example, a request for
testimony in 2 weeks. However, given that all engagements are a function
of time, as well as scope and resources, limitations in one require
balancing the others.
Despite a short time frame, you may have time to review existing
information and carry out testing of data that are critical for
answering a research question, for example: You can question
knowledgeable agency staff about data reliability or review existing
GAO or Inspector General reports to quickly gather information about
data reliability issues. In addition, electronic testing of critical
data elements for obvious errors of completeness and accuracy can
generally be done in a short period of time on all but the most
complicated or immense files. From that review and testing, you will be
able to make a more informed determination about whether the data are
sufficiently reliable to use for the purposes of the engagement. (See
sections 7 and 8 for the actions to take, depending on your
determination.)
[End of section]
Section 7: Making the Preliminary Assessment:
The preliminary assessment is the first decision point in the assessment
process, including the consideration of multiple factors, a
determination of the sufficiency of the data reliability with what is
known at this point, and a decision about whether further work is
required. You will decide whether the data are sufficiently reliable
for the purposes of the engagement, not sufficiently reliable, or as
yet undetermined. Keep in mind that you are not attesting to the
overall reliability of the data or database. You are only determining
the reliability of the data as needed to support the findings,
conclusions, or recommendations of the engagement. As you gather
information and make your judgments, consult appropriate technical
specialists for assistance.
Factors to Consider in the Assessment:
To make the preliminary assessment of the sufficiency of the data
reliability for the engagement, you should consider all factors related
to aspects of the engagement, as well as assessment work performed to
this point. As shown in figure 5, these factors include:
* the expected significance of the data in the final report;
* corroborating evidence;
* level of risk, and;
* the results of initial assessment work.
Figure 5: The Preliminary Assessment:
[See PDF for image]
This figure is a flow chart of the preliminary assessment, depicting
the following information:
Factors:
* Results of initial testing;
* Results of review of existing information;
* Anticipated significance of the data in answering the research
question;
* Strength of corroborating evidence;
* Degree of risk involved.
All factors assist in answering the question:
What is the preliminary assessment of reliability?
* Sufficiently reliable: Use data and disclose limitations, if any;
* Not sufficiently reliable: Take optional actions;
* Undetermined.
Source: GAO.
{End of figure]
Expected Significance of the Data in the Final Report:
In making the preliminary assessment, consider the data in the context
of the final report: Will the engagement team depend on the data alone
to answer a research question? Will the data be summarized or will
detailed information be required? Is it important to have precise data,
making magnitude of errors an issue?
Corroborating Evidence:
You should consider the extent to which corroborating evidence is
likely to exist and will independently support your findings,
conclusions, or recommendations. Corroborating evidence is independent
evidence that supports information in the database. Such evidence, if
available, can be found in the form of alternative databases or expert
views. It is unique to each engagement, and its
strength—persuasiveness—varies.
For help in deciding the strength or weakness of corroborating evidence,
consider the extent to which the corroborating evidence:
* is consistent with the "Yellow Book" standards of evidence—
sufficiency, competence, and relevance;
* provides crucial support;
* is drawn from different types of sources—testimonial, documentary,
physical, or analytical; and;
* is independent of other sources.
Level of Risk:
Risk is the likelihood that using data of questionable reliability
could have significant negative consequences on the decisions of
policymakers and others. To do a risk assessment, consider the
following risk conditions:
* The data could be used to influence legislation, policy, or a program
that could have significant impact.
* The data could be used for significant decisions by individuals or
organizations with an interest in the subject.
* The data will be the basis for numbers that are likely to be widely
quoted, for example, "In 1999, the United States owed the United
Nations about $1.3 billion for the regular and peacekeeping budgets."
* The engagement is concerned with a sensitive or controversial
subject.
* The engagement has external stakeholders who have taken positions on
the subject.
* The overall engagement risk is medium or high.
* The engagement has unique factors that strongly increase risk.
Bear in mind that any one of the conditions may have more importance
than another, depending on the engagement.
Results of Initial Assessment Work:
At this point, as shown in figure 5 (p. 19), the team will already have
performed the initial stage of the data reliability assessment. They
should have the results from the (1) review of all available existing
information about the data and the system that produced them and (2)
initial testing of the critical data elements. These results should be
appropriately documented and reviewed before the team enters into the
decision-making phase of the preliminary assessment. Because the
results will, in whole or in part, provide the evidence that the data
are sufficiently reliable—and therefore competent enough—or not
sufficiently reliable for the purposes of the engagement, the
workpapers should include documentation of the process and results.
Outcomes to Consider in the Assessment:
The results of your combined judgments of the strength of corroborating
evidence and degree of risk suggest different assessments. If the
corroborating evidence is strong and the risk is low, the data are more
likely to be considered sufficiently reliable for your purposes. If the
corroborating evidence is weak and the risk is high, the data are more
likely to be considered not sufficiently reliable for your purposes. The
overall assessment is a judgment call, which should be made in the
context of discussion with team management and technical specialists.
The preliminary assessment categorizes the data as sufficiently
reliable, not sufficiently reliable, or of undetermined reliability.
Each category has implications for the next steps of the data
reliability assessment.
When to Assess Data as Sufficiently Reliable for Engagement Purposes:
You can assess the data as sufficiently reliable for engagement purposes
when you conclude the following: Both the review of related information
and the initial testing provide assurance that (1) the likelihood of
significant errors or incompleteness is minimal and (2) the use of the
data would not lead to an incorrect or unintentional message. You could
have some problems or uncertainties about the data, but they would be
minor, given the research question and intended use of the data. When
the preliminary assessment indicates that the data are sufficiently
reliable, use the data.
When to Assess Data as Not Sufficiently Reliable for Engagement
Purposes:
You can assess the data as not sufficiently reliable for engagement
purposes when you conclude the following: The review of related
information or initial testing indicates that (1) significant errors or
incompleteness exist in some or all of the key data elements and (2)
using the data would probably lead to an incorrect or unintentional
message.
When the preliminary assessment indicates that the data are not
sufficiently reliable, you should seek evidence from other sources,
including (1) alternative computerized data—the reliability of which you
should also assess—or (2) original data in the form of surveys, case
studies, or expert interviews.
You should coordinate with the requester if seeking evidence from other
sources does not result in a source of sufficiently reliable data.
Inform the requester that such data, needed to respond to the request,
are unavailable. Reach an agreement with the requester to:
* redefine the research questions to eliminate the need to use the
data;
* end the engagement, or;
* use the data with appropriate disclaimers.
Remember that you—not the requester—are responsible for deciding what
data to use. If you decide you must use data that you have determined
are not sufficiently reliable for the purposes of the engagement, make
the limitations of the data clear, so that incorrect or unintentional
conclusions will not be drawn. Finally, given that the data you
assessed have serious reliability weaknesses, you should include this
finding in the report and recommend that the agency take corrective
action.
When to Assess Data as of Undetermined Reliability and Consider
Additional Work:
You can assess the data as of undetermined reliability when you conclude
one of the following:
* The review of some of the related information or initial testing
raises questions about the data’s reliability.
* The related information or initial testing provides too little
information to judge reliability.
* The time or resource constraints limit the extent of the examination
of related information or initial testing.
When the preliminary assessment indicates that the reliability of the
data is undetermined, consider doing additional work to determine
reliability. Section 8 provides guidance on the types of additional
work to consider, as well as suggestions if no additional work is
feasible.
[End of section]
Section 8: Conducting Additional Work:
When you have determined (through the preliminary assessment) that the
data are of undetermined reliability, consider conducting additional
work (see figure 6). A range of additional steps to further determine
data reliability includes tracing to and from source documents, using
advanced electronic testing, and reviewing selected system controls.
The mix depends on what weaknesses you identified in the preliminary
assessment and the circumstances specific to your engagement, such as
risk level and corroborating evidence, as well as other factors. Focus
particularly on those aspects of the data that pose the greatest
potential risk for your engagement. You should get help from
appropriate technical specialists to discuss whether additional work is
required and to carry out any part of the additional reliability
assessment.
Figure 6: Choosing and Conducting Additional Work:
[See PDF for image]
This figure is a flow chart of the process of choosing and conducting
additional work. The following information is depicted:
What is most appropriate mix of additional work?
Factors:
* Results of initial testing;
* Results of review of existing information;
* Consider these factors:
- Anticipated significance of the data in answering the research
question;
- Strength of corroborating evidence;
- Degree of risk involved;
* Some options for additional work:
- Trace to or from source documents;
- Use advanced electronic testing;
- Review selected system controls.
Source: GAO.
[End of figure]
Tracing to and from Source Documents:
Tracing a sample of data records to source documents helps you to
determine whether the computer data accurately and completely reflect
these documents. In deciding what and how to trace, consider the
relative risks to the engagement of overstating or understating the
conclusions drawn from the data, for example: On the one hand, if you
are particularly concerned that questionable cases might not have been
entered into the computer system and that as a result, the degree of
compliance may be overstated, you should consider tracing from source
documents to the database. On the other hand, if you are more concerned
that ineligible cases have been included in the database and that as a
result, the potential problems may be understated, you should consider
tracing from the database back to source documents.
The reason to trace only a sample is because sampling saves time and
cost. To be useful, however, the sample should be random and large
enough to estimate the error rate within reasonable levels of
precision. Tracing a random sample will provide the error rate and the
magnitude of errors for the entire data file. It is this error rate
that helps you to determine the data reliability. Generally, every data
file will have some degree of error (see example 1 for error rate and
example 2 for magnitude of errors). Consult statisticians to assist you
in selecting the sampling method most suited to the engagement.
Example 1: According to a random sample, 10 percent of the data records
have incorrect dates. However, the dates may be off by an average of
only 3 days. Depending on what the data are used for, 3 days may not
compromise reliability.
Example 2: The value of a data element was incorrectly entered as
$100,000, rather than $1,000,000. The documentation of the database
shows that the acceptable range for this data element is between $100
and $5,000,000. Therefore, the electronic testing done in the initial
testing phase would have confirmed that the value of $100,000 fell
within that range. In this case, the error could be caught, not by
electronic testing, but only by tracing the data to source documents.
Tracing to Source Documents:
Consider tracing to source documents when (1) the source documents are
available relatively easily or (2) the possible magnitude of errors is
especially critical.
To trace a sample to source documents, match the entered data with the
corresponding data in the source documents. But in attempting to trace
entered data back to source documents, several problems can arise:
Source documents may not be available because they were destroyed, were
never created, or are not centrally located.
Several options exist if source documents are not available. For those
documents never created—for example, when data may be based on
electronic submissions—use interviews to obtain related information, any
corroborating evidence obtained earlier, or a review of the adequacy of
system controls.
Tracing from Source Documents:
Consider tracing from source documents, instead of or in addition to
tracing a sample to source documents, when you have concerns that the
data are not complete. To trace a sample from source documents, match
the source documents with the entered data. Such tracing may be
appropriate to determine whether all data are completely entered.
However, if source documents were never created or are now missing, you
cannot identify the missing data.
Using Advanced Electronic Testing:
Advanced electronic testing goes beyond the basic electronic testing
that you did in initial testing (see section 5). It generally requires
specialized computer programs to test for specific conditions in the
data. Such testing can be particularly helpful in determining the
accuracy and completeness of processing by the application system that
produced the data. Consider using advanced electronic testing for:
* following up on troubling aspects of the data—such as extremely high
values associated with a certain geographic location—found in initial
testing or while analyzing the data;
* testing relationships—cross-tabulation—between data elements, such
as whether data elements follow a skip pattern from a questionnaire;
and;
* verifying that computer processing is accurate and complete, such as
testing a formula used in generating specific data elements.
Depending on what will be tested, this testing can require a range of
programming skills—from creating cross-tabulations on related data
elements to duplicating an intricate automated process with more
advanced programming techniques. Consult appropriate technical
specialists, as needed.
Reviewing Selected System Controls:
Your review of selected system controls—the underlying structures and
processes of the computer in which the data are maintained—can provide
some assurance that the data are sufficiently reliable. Examples of
system controls are limits on access to the system and edit checks on
data entered into the system. Controls can reduce, to an acceptable
level, the risk that a significant mistake could occur and remain
undetected and uncorrected. Limit the review to evaluating the specific
controls that can most directly affect the reliability of the data in
question. Choose areas for review on the basis of what is known about
the system. Sometimes, you identify potential system control problems
in the initial steps of the assessment. Other times, you learn during
the preliminary assessment that source documents are not readily
available. Therefore, a review of selected system controls is the best
method to determine if data were entered reliably. If needed, consult
information system auditors for help in evaluating general and
application controls.
Using what you know about the system, concentrate on evaluating the
controls that most directly affect the data. These controls will usually
include (1) certain general controls, such as logical access and
control of changes to the data, and (2) the application controls that
help to ensure that the data are accurate and complete, as well as
authorized. The steps for reviewing selected system controls are:
* gain a detailed understanding of the system as it relates to the data
and;
* identify and assess the application and general controls that are
critical to ensuring the reliability of the data required for the
engagement.
Using Data of Undetermined Reliability:
In some situations, it may not be feasible to perform any additional
work, for example, when (1) given a short time frame (too short for a
complete assessment), (2) original computer files have been deleted, or
(3) access to needed documents is unavailable. See section 9 for how to
proceed.
[End of section]
Section 9: Making the Final Assessment:
During the final assessment, you should consider the results of all your
previous work to determine whether, for your intended use, the data are
sufficiently reliable, not sufficiently reliable, or still
undetermined. Again, remember that you are not attesting to the
reliability of the data or database. You are only determining the
sufficiency of the reliability of the data for your intended use. The
final assessment will help you decide what actions to take (see figure
7).
Figure 7: Making the Final Assessment:
[See PDF for image]
This figure is a flow chart of the process for making the final
assessment. The following information is depicted:
Factors to consider:
* Significance of the data in answering the research question;
* Results of Initial testing;
* Strength of corroborating evidence;
* Results of review of existing information;
* Degree of risk involved;
* Results of any additional work;
What is the final assessment of reliability?
Sufficiently reliable: Use data and disclose limitations, if any;
Not sufficiently reliable: Take optional actions.
Source: GAO.
[End of figure]
The following are some considerations to help you decide whether you can
use the data:
* The corroborating evidence is strong.
* The degree of risk is low.
* The results of additional assessment (1) answered issues raised in the
preliminary assessment and (2) did not raise any new questions.
* The error rate, in tracing to or from source documents, did not
compromise reliability.
In making this assessment, you should consult with appropriate technical
specialists.
Sufficiently Reliable Data:
You can consider the data sufficiently reliable when you conclude the
following: On the basis of the additional work, as well as the initial
assessment work, using the data would not weaken the analysis nor lead
to an incorrect or unintentional message. You could have some problems
or uncertainties about the data, but they would be minor, given the
research question and intended use of the data. When your final
assessment indicates that the data are reliable, use the data.
Not Sufficiently Reliable Data:
You can consider the data to be not sufficiently reliable when you
conclude the following: On the basis of information drawn from the
additional assessment, as well as the preliminary assessment, (1) using
the data would most likely lead to an incorrect or unintentional
message and (2) the data have significant or potentially significant
limitations, given the research question and intended use of the data.
When you determine that the data are not sufficiently reliable, you
should inform the requester that sufficiently reliable data, needed to
respond to the request, are unavailable. Remember that you—not the
requester—are responsible for deciding what data to use. Although the
requester may want information based on insufficiently reliable data,
you are responsible for ensuring that data are used appropriately to
respond to the requester. If you decide to use the data for the report,
make the limitations of the data clear, so that incorrect or
unintentional conclusions will not be arrived at. Appropriate team
management should be consulted before you agree to use data that are
not sufficiently reliable.
Finally, given that the data you assessed have serious reliability
weaknesses, you should include this finding in the report and recommend
that the agency take corrective action.
Data of Undetermined Reliability:
You can consider the data to be of undetermined reliability when you
conclude the following: On the basis of the information drawn from any
additional work, as well as the preliminary assessment, (1) use of the
data could lead to a incorrect or unintentional message and (2) the
data have significant or potentially significant limitations, given the
research question and the intended use. You can consider the data to be
of undetermined reliability if specific factors—such as short time
frames, the deletion of original computer files, and the lack of access
to needed documents—are present. If you decide to use the data, make
the limitations of the data clear, so that incorrect or unintentional
conclusions will not be arrived at.
As noted above in the case of not sufficiently reliable data, when you
determine that the data are of undetermined reliability, you should
inform the requester—if appropriate—that sufficiently reliable data,
needed to respond to the request, are unavailable. Remember that
you—not the requester—are responsible for deciding what data to use.
Although the requester may want information based on data of
undetermined reliability, you are responsible for ensuring that
appropriate data are used to respond to the requester. If you decide to
use the data in your report, make the limitations clear, so that
incorrect or unintentional conclusions will not be arrived at.
Appropriate team management should be consulted before you agree to use
data of undetermined reliability.
[End of section]
Section 10: Including Appropriate Language in the Report:
In the report, you should include a statement in the methodology section
about conformance to generally accepted government auditing standards
(GAGAS). These standards refer to how you did your work, not how
reliable the data are. Therefore, you are conforming to GAGAS as long
as, in reporting, you discuss what you did to assess the data; disclose
any data concerns; and reach a judgment about the reliability of the
data for use in the report.
Furthermore, in the methodology section, include a discussion of your
assessment of data reliability and the basis for this assessment. The
language in this discussion will vary, depending on whether the data are
sufficiently reliable, not sufficiently reliable, or of undetermined
reliability. In addition, you may need to discuss the reliability of
the data in other sections of the report. Whether you do so depends on
the importance of the data to the message.
Sufficiently Reliable Data:
Present your basis for assessing the data as sufficiently reliable,
given the research questions and intended use of the data. This
presentation includes (1) noting what kind of assessment you relied on,
(2) explaining the steps in the assessment, and (3) disclosing any data
limitations. Such disclosure includes:
* telling why using the data would not lead to an incorrect or
unintentional message;
* explaining how limitations could affect any expansion of the message,
and;
* pointing out that any data limitations are minor in the context of the
engagement.
Not Sufficiently Reliable Data:
Present your basis for assessing the data as not sufficiently reliable,
given the research questions and intended use of the data. This
presentation should include what kind of assessment you relied on, with
an explanation of the steps in the assessment.
In this explanation, (1) describe the problems with the data, as well
as why using the data would probably lead to an incorrect or
unintentional message, and (2) state that the data problems are
significant or potentially significant. In addition, if the report
contains a conclusion or recommendation supported by evidence other
than these data, state that fact. Finally, if the data you assessed are
not sufficiently reliable, you should include this finding in the
report and recommend that the audited entity take corrective action.
Data of Undetermined Reliability:
Present your basis for assessing the reliability of the data as
undetermined. Include such factors as short time frames, the deletion
of original computer files, and the lack of access to needed documents.
Explain the reasonableness of using the data, for example: These are
the only available data on the subject; the data are widely used by
outside experts or policymakers; or the data are supported by credible
corroborating evidence. In addition, make the limitations of the data
clear, so that incorrect or unintentional conclusions will not be drawn
from the data. For example, indicate how the use of these data could
lead to an incorrect or unintentional message. Finally, if the report
contains a conclusion or recommendation supported by evidence other
than these data, state that fact.
[End of section]
Glossary of Technical Terms:
accuracy: Freedom from error in the data.
completeness: The inclusion of all necessary parts or elements.
database: A collection of related data files (for example, questionnaire
responses from several different groups of people, with each group’s
identity maintained.)
data element: An individual piece of information that has definable
parameters, sometimes referred to as variables or fields (for example,
the response to any question in a questionnaire).
data file: A collection of related data records, also referred to as a
data set (for example, the collected questionnaire responses from a
group of people).
data record: A collection of related data elements that relate to a
specific event, transaction, or occurrence (for example, questionnaire
responses about one individual—such as age, sex, and marital status).
source document. Information that is the basis for entry of data into a
computer.
[End of section]
Footnotes:
[1] U.S. General Accounting Office, Government Auditing Standards,
GAO/OGC-94-4 (Washington, D.C.: June 1994), pp. 62-87.
[2] A data element is a unit of information with definable parameters
(for example, a Social Security number), sometimes referred to as a
data variable or data field.
[3] General controls refers to the structure, policies, and procedures—
which apply to all or a large segment of an organization’s information
systems—that help to ensure proper operation, data integrity, and
security. Application controls refers to the structure, policies, and
procedures that apply to individual application systems, such as
inventory or payroll.
[4] Guidance for carrying out reviews of general and application
controls is provided in the U.S. General Accounting Office, Federal
Information System Controls Audit Manual, GAO/AIMD-12.19.6 (Washington,
D.C.: Jan. 1999).
[5] Though an in-depth discussion of quality-assurance practices to be
used in electronic testing and analyses is beyond the scope of this
guidance, it is important to perform appropriate checks to ensure that
you have obtained the correct file. All too often, analysts receive an
incorrect file (an early version or an incomplete file). Appropriate
steps would include counting records and comparing totals with the
responsible agency or entity.
[End of section]
GAO’s Mission:
The General Accounting Office, the investigative arm of Congress,
exists to support Congress in meeting its constitutional
responsibilities and to help improve the performance and accountability
of the federal government for the American people. GAO examines the use
of public funds; evaluates federal programs and policies; and provides
analyses, recommendations, and other assistance to help Congress make
informed oversight, policy, and funding decisions. GAO’s commitment to
good government is reflected in its core values of accountability,
integrity, and reliability.
Obtaining Copies of GAO Reports and Testimony:
The fastest and easiest way to obtain copies of GAO documents at no
cost is through the Internet. GAO’s Web site [hyperlink,
http://www.gao.gov] contains abstracts and full text files of current
reports and testimony and an expanding archive of older products. The
Web site features a search engine to help you locate documents using
key words and phrases. You can print these documents in their entirety,
including charts and other graphics.
Each day, GAO issues a list of newly released reports, testimony, and
correspondence. GAO posts this list, known as “Today’s Reports,” on its
Web site daily. The list contains links to the full-text document
files. To have GAO e-mail this list to you every afternoon, go to
[hyperlink, http://www.gao.gov] and select “Subscribe to daily E-mail
alert for newly released products” under the GAO Reports Order GAO
Products heading.
Order by Mail or Phone:
The first copy of each printed report is free. Additional copies are $2
each. A check or money order should be made out to the Superintendent
of Documents. GAO also accepts VISA and Mastercard. Orders for 100 or
more copies mailed to a single address are discounted 25 percent.
Orders should be sent to:
U.S. General Accounting Office:
441 G Street NW, Room LM:
Washington, D.C. 20548:
To order by Phone:
Voice: (202) 512-6000:
TDD: (202) 512-2537
Fax: (202) 512-6061
To Report Fraud, Waste, and Abuse in Federal Programs Contact:
Web site: [hyperlink, http://www.gao.gov/fraudnet/fraudnet.htm]:
E-mail: fraudnet@gao.gov:
Automated answering system: (800) 424-5454 or (202) 512-7470:
Public Affairs:
Jeff Nelligan, managing director, NelliganJ@gao.gov:
(202) 512-4800:
U.S. General Accounting Office:
441 G Street NW, Room 7149:
Washington, D.C. 20548: