GAO-05-343, Head Start: Further Development Could Allow Results of New Test to Be Used for Decision Making


This is the accessible text file for GAO report number GAO-05-343 
entitled 'Head Start: Further Development Could Allow Results of New 
Test to Be Used for Decision Making' which was released on May 17, 
2005. 

This text file was formatted by the U.S. Government Accountability 
Office (GAO) to be accessible to users with visual impairments, as part 
of a longer term project to improve GAO products' accessibility. Every 
attempt has been made to maintain the structural and data integrity of 
the original printed product. Accessibility features, such as text 
descriptions of tables, consecutively numbered footnotes placed at the 
end of the file, and the text of agency comment letters, are provided 
but may not exactly duplicate the presentation or format of the printed 
version. The portable document format (PDF) file is an exact electronic 
replica of the printed version. We welcome your feedback. Please E-mail 
your comments regarding the contents or accessibility features of this 
document to Webmaster@gao.gov. 

This is a work of the U.S. government and is not subject to copyright 
protection in the United States. It may be reproduced and distributed 
in its entirety without further permission from GAO. Because this work 
may contain copyrighted images or other material, permission from the 
copyright holder may be necessary if you wish to reproduce this 
material separately. 

Report to Congressional Requesters: 

United States Government Accountability Office: 

GAO: 

May 2005: 

Head Start: 

Further Development Could Allow Results of New Test to Be Used for 
Decision Making: 

GAO-05-343: 

GAO Highlights: 

Highlights of GAO-05-343, a report to congressional requesters:

Why GAO Did This Study: 

In September 2003, the Head Start Bureau, in the Department of Health 
and Human Services (HHS) Administration for Children and Families 
(ACF), implemented the National Reporting System (NRS), the first 
nationwide skills test of over 400,000 4- and 5-year-old children. The 
NRS is intended to provide information on how well Head Start grantees 
are helping children progress. 

Given the importance of the NRS, this report examines: what information 
the NRS is designed to provide; how the Head Start Bureau has responded 
to concerns raised by grantees and experts during the first year of 
implementation; and whether the NRS provides the Head Start Bureau with 
quality information. 

What GAO Found: 

The Head Start Bureau developed the NRS to gauge the extent to which 
Head Start grantees help children progress in specific skill areas, 
including understanding spoken English, recognizing letters, 
vocabulary, and early math. Due to time constraints and technical 
matters, the Head Start Bureau adapted portions of other assessments 
for use in the NRS. 

Head Start Bureau officials have responded to some concerns raised 
during the first year of NRS implementation, but other issues remain. 
For example, the Head Start Bureau has modified training materials and 
is exploring the feasibility of sampling. However, it is not monitoring 
whether grantees are inappropriately changing instruction to emphasize 
areas covered in the NRS. 

Head Start Bureau officials have said NRS results will eventually be 
used for program improvement, targeting training and technical 
assistance, and program accountability; however, the Head Start Bureau 
has not stated how NRS results will be used to realize these purposes. 
Currently, results from the first year of the NRS are of limited value 
for accountability purposes because the Head Start Bureau has not shown 
that the NRS meets professional standards for such uses, namely that 
(1) the NRS provides reliable information on children’s progress during 
the Head Start program year, especially for Spanish-speaking children, 
and (2) its results are valid measures of the learning that takes 
place. The NRS also may not provide sufficient information to target 
technical assistance to the Head Start centers and classrooms that need 
it most. 

An Assessor and Head Start Student Demonstrate the NRS Assessment.: 

[See PDF for image]

[End of figure]

What GAO Recommends: 

GAO recommends the HHS Assistant Secretary for ACF, in collaboration 
with the Head Start Bureau, determine how NRS data will be used for 
accountability and targeting technical assistance; monitor the effects 
of the NRS on local Head Start practices; use first year NRS results to 
conduct further study of the reliability and validity of the NRS; 
compile a detailed, well-organized document on the technical quality of 
the NRS; improve management of its data on NRS participation; and study 
the costs and benefits of sampling in administering the NRS. ACF 
generally agreed with our recommendations. 

www.gao.gov/cgi-bin/getrpt?GAO-05-343. 

To view the full product, including the scope and methodology, click on 
the link above. For more information, contact Marnie S. Shaul at (202) 
512-7215 or shaulm@gao.gov. 

[End of section]

Contents: 

Letter: 

Results in Brief: 

Background: 

NRS Assesses Selected Skills Using Adaptations of Other Assessments: 

The Head Start Bureau Has Been Responsive to Some Implementation Issues 
Raised during First Year of NRS, but Others Remain: 

The Head Start Bureau Has Not Specified How NRS Results Will Be Used 
and Important Analyses Remain to Be Done: 

Conclusions: 

Recommendations for Executive Action: 

Agency Comments and Our Evaluation: 

Appendix I: Objectives, Scope and Methodology: 

Appendix II: Survey Instrument: 

Appendix III: Comments from the Department of Health and Human 
Services: 

Appendix IV: GAO Contacts and Staff Acknowledgments: 

Tables: 

Table 1: Examples of Information Included in Computer-Based Reporting 
System (CBRS): 

Table 2: Description of NRS Components and Their Modifications: 

Table 3: Sample Disposition: 

Figures: 

Figure 1: Head Start Grantees, Delegate Agencies, and Centers: 

Figure 2: Timeline of Events Leading to Implementation of NRS: 

Figure 3: Example of NRS Letter Naming Instructions and Task: 

Figure 4: Example of NRS Early Math Skills Instructions and Task: 

Figure 5: Example of Type of Vocabulary Instructions and Task Used in 
the NRS: 

Abbreviations: 

ACF: Administration for Children and Families: 
CBRS: Computer-Based Reporting System: 
ECLS-K: Early Childhood Longitudinal Study of a Kindergarten cohort: 
HHS: U.S. Department of Health and Human Services: 
HSB: Head Start Bureau: 
NAEYC: National Association for the Education of Young Children: 
NAS: National Academy of Sciences: 
NHSA: National Head Start Association: 
NRS: National Reporting System: 
OLDS: Oral Language Development Scale: 
PPVT: Peabody Picture Vocabulary Test: 
Pre-LAS 2000: Pre-Language Assessment Scale 2000: 
QRC: Head Start Quality Research Centers: 
TWG: Technical Work Group: 

United States Government Accountability Office: 

Washington, DC 20548: 

May 17, 2005: 

The Honorable Edward M. Kennedy: 
Ranking Minority Member: 
Committee on Health, Education, Labor and Pensions: 
United States Senate: 

The Honorable Christopher J. Dodd: 
Ranking Minority Member: 
Subcommittee on Education and Early Childhood Development: 
Committee on Health, Education, Labor and Pensions: 
United States Senate: 

In fall 2003, the federal Head Start program initiated a nationwide 
skills test of over 400,000 4-and 5-year-old children. This test, 
called the Head Start National Reporting System (NRS), is intended to 
meet a long-standing need for systematic information on how well 
specific Head Start grantees are helping children learn. Head Start is 
designed to promote school readiness and healthy development among poor 
preschool children and provides services to nearly 1 million children, 
generally between the ages of 3 and 5, through nearly 1700 grantees. 
These grantees or their delegates provide services at about 19,000 Head 
Start centers nationally, with each grantee having from 1 to over 100 
centers. For nearly a decade the Head Start Bureau (HSB) and the U.S. 
Department of Health and Human Services (HHS) have been engaged in 
promoting accountability and moving toward a results-oriented 
evaluation of Head Start. The NRS builds on this work. The NRS was 
developed in response to President Bush's April 2002 announcement of 
the "Good Start, Grow Smart" early childhood initiative that directed 
HHS to develop a national accountability system to ensure that every 
Head Start grantee will assess the progress made by children in early 
literacy, language, and numeracy skills. 

Head Start teachers, or others trained as NRS assessors, administer the 
NRS to children individually in the fall and spring of the Head Start 
year. The NRS begins with a game of "Simon Says," lasts about 15 
minutes, and includes four sub-tests designed to screen for 
understanding of spoken English and to assess skills in recognizing 
letters, vocabulary, and early math. During the test, an assessor sits 
across from a child at a table and asks scripted questions of the 
child, and the child responds by verbally identifying or pointing to 
pictures, numbers, or letters that are contained in a 3-ring binder. 
The assessor marks the child's responses on a computer-readable scoring 
sheet. While all of the children are given at least the portion of the 
English-language assessment that screens for understanding of spoken 
English, children whose primary language is Spanish are also assessed 
using a Spanish version of the NRS. Children who speak both English and 
Spanish are given both versions of the NRS and scores from both tests 
are reported separately. 

Although other evaluations of children's skills and Head Start 
performance exist, the NRS differs from them in its scale, type, and 
purpose. The NRS is a standardized test intended for all 
prekindergarten Head Start children. It represents the first time that 
HSB will use children's performance on a standardized test to measure 
how well specific Head Start grantees are helping children progress. 
Many in the Head Start community and beyond agree that it is a laudable 
goal to look at Head Start at the national and grantee levels to 
determine whether Head Start achieves its stated objectives. However, 
there have been significant concerns about whether the NRS, as 
currently composed, is the right way to accomplish this goal. 

Given the importance HSB places on measuring Head Start performance and 
the concerns about the NRS, we examined (1) what information the NRS is 
designed to provide, (2) how HSB has responded to implementation issues 
raised by the Head Start grantees and experts during the first year of 
NRS implementation, and what issues remain to be addressed, and (3) 
whether the NRS provides HSB with the quality of information it needs 
to meet its purposes. 

To answer these questions, we collected and analyzed information from 
multiple sources. To determine what information the NRS is designed to 
provide, we interviewed representatives from HSB, its contractors, and 
early childhood professional organizations and we reviewed documents 
chronicling the steps HSB took in developing the NRS. To examine how 
HSB responded to implementation issues raised by Head Start grantees 
and experts during the first year of NRS implementation and what issues 
remain to be addressed, we interviewed representatives from HSB and 
randomly sampled Head Start grantees and delegates from the population 
of all Head Start grantees and delegates during the 2003-2004 school 
year. We received responses from 80 percent of the grantees and 
delegates we surveyed. We also visited 12 Head Start grantees in 5 
states (Colorado, Maryland, Massachusetts, Rhode Island, and Virginia), 
to interview staff who conducted the assessments and to observe them 
administering the NRS to children. The states and grantees chosen for 
site visits were judgmentally selected to include a range of enrollment 
sizes, types of program, rural and urban locations, and linguistic 
populations. Finally, to examine whether the NRS provides HSB with the 
quality of information it needs to meet its goals, we reviewed the 
professionally accepted standards for test development, interviewed all 
of the members of the Technical Work Group--a team of experts convened 
to assist HSB and its contractors in the design and implementation of 
the NRS--and consulted with individuals recommended by the National 
Academy of Sciences as experts in the areas of test design and the 
educational testing of Spanish-speaking and bilingual children. These 
independent experts reviewed documents provided by HSB and its 
contractors pertaining to the adequacy and appropriateness of the 
assessment. See appendix I for additional information on our scope and 
methodology. We conducted our work between May 2004 and February 2005 
in accordance with generally accepted government auditing standards. 

Results in Brief: 

HSB developed the NRS to gauge the extent to which Head Start grantees 
help children progress in specific academic skill areas. The NRS 
includes materials adapted from other tests and is designed to provide 
information on selected academic skills of children in Head Start. 
Specifically, the NRS probes children's understanding of spoken English 
and skills in vocabulary, letter recognition, and simple math through 
the use of pictures, letters, and numbers. For example, children are 
asked to count marbles pictured on a page and identify the height of a 
teddy bear pictured beside a simple ruler. Children's skills in the 
selected areas are assessed to determine how well participating 
children, as a group, are learning and to identify grantees where 
children are not making the expected progress. 

In response to concerns raised during the first year of NRS 
implementation, HSB has made changes to how the NRS is implemented and 
is considering other changes, although other concerns have not yet been 
addressed. In response to assessors' feedback that the initial training 
instructed assessors to follow the assessment script too rigidly, HSB 
modified some of its training materials to better prepare assessors for 
the situations they encountered when implementing the test. In 
addition, in response to suggestions by Technical Work Group members, 
HSB changed the order in which the Spanish and English assessments are 
administered. HSB is also considering substantive changes like 
requiring only a sample of children to take the NRS and adding a social-
emotional development component to the NRS. According to our survey, 
over 60 percent of grantees found it at least moderately challenging to 
find time to assess all children, and sampling may help to minimize 
this burden. Adding a measure of social-emotional development would 
help to address concerns about the narrow range of skills that the NRS 
tests. While these changes demonstrate HSB's responsiveness to some 
concerns raised, the Bureau has yet to address other potential 
implementation problems, such as whether all 4-and 5- year-olds 
eligible to participate in the NRS are assessed and whether assessors 
have narrowed the curriculum they teach in response to the NRS. 

Analysis of the NRS is currently incomplete to support its use for the 
purposes of accountability and targeting training and technical 
assistance. First, HSB has not articulated a strategy for how it will 
use information from the NRS to meet its purposes. For example, it has 
not articulated what level of progress is expected, how it will use NRS 
scores to target training and technical assistance, or how it will hold 
grantees accountable for achieving results. Such decisions are 
important first steps in any test development process. Further, results 
from the first year of the NRS currently cannot be used to hold 
grantees accountable or to target training and technical assistance 
because HSB analyses have not yet shown that the NRS provides the scope 
and quality of assessment information needed for these purposes. The 
usefulness of educational tests is dependent on their consistency of 
measurement (their reliability), along with whether they measure what 
they are designed to measure (their validity). HSB has asserted that 
the NRS meets these criteria because it borrows certain material from 
existing tests that have met them, but the agency has not shown the NRS 
itself to be valid and reliable over time. Test developers generally 
use a pilot test to establish reliability and validity, but due to time 
constraints, HSB did not conduct a full pilot test. In addition, 
language experts advising HSB have raised serious concerns about 
whether the Spanish version of the NRS adequately measures the skills 
of Spanish-speaking children and whether results from the English and 
Spanish versions are comparable. Responding in part to these concerns, 
HSB has not yet used first year results of the NRS for accountability 
decisions and has stated that future accountability decisions will not 
be based solely on NRS results, but will reflect other grantee 
information as well. The NRS also may not provide sufficient 
information to target training and technical assistance to the centers 
and classrooms that need it most. NRS results are aggregated across the 
many classrooms and centers that a grantee may operate and results are 
reported only at the grantee and delegate levels, because results are 
more reliable at these levels than at lower levels. However, a 
grantee's average score could mask variability among the multiple 
classrooms or centers and limit information on where technical 
assistance would be most effectively targeted. Furthermore, NRS results 
alone do not indicate why results may be high or low, or what type of 
training or technical assistance would be appropriate. 

To help ensure that the NRS successfully and efficiently achieves its 
purposes, we are recommending that the HHS Assistant Secretary for the 
Administration for Children and Families (ACF) take several actions, 
including articulating plans for use of the NRS results, providing 
additional technical information on the test results, and conducting 
additional study of unintended effects and alternative ways for 
improving the test. ACF generally agreed with GAO's recommendations and 
described some of the actions it has already begun. In addition, ACF 
submitted detailed comments on certain aspects of the draft report, 
including comments concerning the level of evidence for the validity of 
the NRS. 

Background: 

Established in 1965, Head Start is a federally funded early childhood 
development program that served over 900,000 children at a cost of $6.8 
billion in 2004. Head Start offers low-income children a broad range of 
services, including educational, medical, dental, mental health, 
nutritional, and social services.[Footnote 1] Children enrolled in Head 
Start are generally between the ages of 3 and 5 and come from varying 
ethnic and racial backgrounds. Head Start is administered by HSB within 
ACF. HSB awards Head Start grants directly to local grantees. Grantees 
may develop or adopt their own curricula and practices within federal 
guidelines. Grantees may contract with other organizations--called 
delegate agencies--to run all or part of their local Head Start 
programs. Each grantee or delegate agency may have one or more centers, 
each containing one or more classrooms. In this report, the term 
"grantee" is used to refer to both grantees and delegate agencies. 
Figure 1 provides information on the numbers of Head Start grantees, 
delegate agencies, centers and classrooms. 

Figure 1: Head Start Grantees, Delegate Agencies, and Centers: 

[See PDF for image]

[End of figure]

Since the inception of Head Start, questions have been raised about the 
effectiveness of the program. In 1998, we reported that Head Start 
lacked objective information on performance of individual grantees and 
Congress enacted legislation requiring HSB to establish specific 
educational standards applicable to all Head Start programs and allowed 
development of local assessments to measure whether the standards are 
met.[Footnote 2] HSB implemented this legislation by developing the 
Child Outcomes Framework to guide Head Start grantees in their ongoing 
assessment of the progress of children. The Framework covers a broad 
range of child skill and development areas and incorporates each of the 
legislatively mandated goals, such as that children "use and understand 
an increasingly complex and varied vocabulary" and "identify at least 
10 letters of the alphabet." 

Since 2000, HSB has required every Head Start grantee to include each 
of the areas in the Framework in the child assessments that each 
grantee adopts and implements. The eight broad areas included in the 
Framework are language development, literacy, mathematics, science, 
creative arts, social and emotional development, approaches to 
learning, and physical health and development. Grantees are permitted 
to determine how to assess children's progress in these areas. These 
assessments are to align with the grantee's curriculum; as a result the 
specific assessments vary across the grantees. The assessments occur 3 
times each year and generally involve observing the children during 
normal classroom activities.[Footnote 3] The results of the assessments 
are used for the purposes of individual program improvement and 
instructional support and are not aggregated across grantees or 
systematically shared with federal officials. The NRS, prompted by the 
April 2002 announcement of President Bush's Good Start, Grow Smart 
initiative, builds on the 1998 legislation by requiring all Head Start 
programs to implement the same assessment, twice a year, to all 4-and 5-
year-old Head Start participants who will attend kindergarten the 
following year. 

When President Bush announced this initiative in April 2002, it called 
for full implementation in fall 2003; as a result the NRS was developed 
and preparations for implementation occurred within an 18-month period. 
See figure 2. Shortly after the President announced this initiative, 
HSB hired a contractor to assist it in developing and implementing the 
NRS. The contractor, working closely with HSB, was responsible for the 
design and field testing of the NRS, including developing training 
materials to support national implementation of the reporting system by 
grantees.[Footnote 4] HSB also worked with the Technical Work Group and 
others throughout implementation of the NRS. The Technical Work Group 
includes 16 experts in such areas as child development, educational 
testing, and bilingual education. They advised HSB on the selection of 
assessments, the appropriateness of the assessments in addressing the 
mandated indicators, the technical merit of the assessments, and the 
overall design of the NRS. While the Technical Work Group members 
offered advice, the group members were not always in agreement with 
each other and HSB was not obligated to act on any of the advice it 
received. A list of the Technical Work Group members and their 
professional affiliations is included in appendix I. 

Figure 2: Timeline of Events Leading to Implementation of NRS: 

[See PDF for image]

[End of figure]

Through focus groups, teleconferences, and various correspondences, HSB 
officials communicated to Head Start grantees the purpose of the NRS 
and their plans for administering the assessment. Focus groups and 
discussions were held with various interested parties, including Head 
Start managers and directors and experts from universities and the 
public sector, on issues ranging from strengths and limitations of 
various assessment tools to strategies for assessing non-English 
speaking children. HSB also received input through a 60-day public 
comment period, from mid-April to June 2003. 

Another contractor developed a Computer-Based Reporting System (CBRS) 
for the NRS. Local Head Start staff use the CBRS to enter descriptive 
information about their grantees, centers, classrooms, teachers, and 
children, as shown in table 1, as well as to keep track of which 
children have been assessed. HSB analyzes the descriptive information 
from the CBRS in conjunction with the child assessment data to develop 
reports on the progress of specific subgroups of children. For example, 
HSB can report separately on the average scores of children enrolled in 
part-day programs and those enrolled in full-day programs. 

Table 1: Examples of Information Included in Computer-Based Reporting 
System (CBRS): 

Program information: 
* Program name; 
* Director name; 
* Number of delegates; 
* Number of centers; 
* Number of family day care centers; 
* NRS lead for program; 

Center information: 
* Center name; 
* Center type; 
* Enrollment year start date; 
* Enrollment year end date; 
* NRS center lead name; 

Classroom level information: 
* Teacher name; 
* Classroom type; 
* Day option; 
* Total enrollment; 
* Number of additional teaching staff; 
* Teacher entry date to classroom; Assessor information: 
* Name; 
* Highest grade or year of school completed; 
* Highest degree held in Early Childhood Education or related field; 

Teacher information: 
* Teacher name; 
* In what languages is teacher fluent? 
* Total years teaching; 
* How many years teaching Head Start? 
* Highest grade or year of school completed; 
* Child Development Associate credential; 

Child information: 
* Child name; 
* DOB; 
* Date of entry into classroom; 
* Child unique ID from center; 
* Years in preschool Head Start; 
* Does child have a disability? 
* Does child speaks a language other than English at home? 
* If yes, how well does child speak English? 
* If yes, what is primary language? 
* Ethnicity/race. 

Source: Head Start National Reporting System, Computer-Based Reporting 
System Train-the-Trainer Manual, Prepared by Xtria, LLC, February 2004. 

[End of table]

HSB, with assistance from the contractors, worked to ensure local staff 
received adequate training on administering the assessment and using 
the CBRS, and provided guidance on how to obtain consent from parents. 
Training and certification of all assessors was required so that all 
assessors would administer the NRS in the same way. Two-and-a-half day 
training sessions were held at eight sites throughout the U.S. and 
Puerto Rico during July and August 2003. Roughly 2,800 individuals 
completed the training, of which 484 were certified in both English and 
Spanish. In turn, these certified trainers held training sessions 
locally to train and certify additional staff who would be able to 
administer assessments. 

The development of educational tests is a science in itself, to which 
university departments, professional organizations, and private 
companies are devoted. Among the most important concepts in test 
development are validity and reliability. Validity refers to whether 
the test results mean what they are expected to mean and whether 
evidence supports the intended interpretations of test scores for a 
particular purpose. Reliability refers to whether or not a test yields 
consistent results. Validity and reliability are not properties of 
tests; rather, they are characteristics of the results obtained using 
the tests. For example, even if a test designed for 4th graders were 
shown to produce meaningful measures of their understanding of 
geometry, this wouldn't necessarily mean that it would do so when 
administered to 2nd or 6th graders or with a change in directions 
allowing use of a compass and ruler. Test developers typically 
implement "pilot" tests that represent the actual testing population 
and conditions and they use data from the pilot to evaluate the 
reliability and validity of a test. This process generally takes more 
than 1 year, especially if the test is designed to measure changes in 
performance. 

In the remainder of the report, we will discuss how the focus of the 
NRS was determined and the assessment was developed, HSB's response to 
problems in initial implementation as well as some implementation 
issues that remain unaddressed, and the extent to which the assessment 
meets the professional and technical standards to support specific 
purposes identified by HSB. 

NRS Assesses Selected Skills Using Adaptations of Other Assessments: 

The NRS assesses vocabulary, letter recognition, simple math skills, 
and screens for understanding of spoken English. As initially conceived 
by HSB, the NRS was to gauge the progress of Head Start children in 13 
congressionally mandated indicators of learning. However, time 
constraints and technical matters precluded HSB from assessing children 
on all of the indicators and led HSB to consider, and eventually adopt, 
portions of other assessments for use in the NRS. 

The 18 months from announcing the Good Start, Grow Smart initiative, of 
which the NRS is a part, to implementing the assessment was not enough 
time for HSB to develop a completely new assessment. Therefore, HSB, 
with the advice of its contractor and the Technical Work Group, chose 
to borrow material from existing assessments. Concerns raised by 
Technical Work Group members and the contractor about the length and 
complexity of the assessment and the technical adequacy of individual 
components eventually led to limiting the areas assessed in the NRS, 
from 13 skills to 6. The six legislatively mandated skills that HSB 
targeted included whether children in Head Start: 

* use increasingly complex and varied spoken vocabulary;

* understand increasingly complex and varied vocabulary;

* identify at least 10 letters of the alphabet;

* know numbers and simple math operations, such as addition and 
subtraction;

* for non-English speaking children, demonstrate progress in listening 
to and understanding English; and: 

* for non-English speaking children, show progress in speaking English. 

In April and May of 2003 an assessment that included 5 components 
covering the 6 skills was field tested with 36 Head Start programs to 
examine the basic adequacy of the NRS, as well as the method for 
training assessors, and the use of the CBRS. The field test also 
included a Spanish version of the NRS. Based on the field test, one 
component--phonological awareness, or one's ability to hear, identify, 
and manipulate sounds--was eliminated. While this component examined an 
area that experts have linked to prevention of reading difficulties, 
the test used to assess it was problematic. HSB moved forward with the 
other components of the NRS. The four components of the NRS each 
measure one or more of the six legislatively-mandated indicators. 

The four components that comprise the NRS are from the following tests: 

* Oral Language Development Scale (OLDS) of the Pre-Language Assessment 
Scale 2000 (Pre-LAS 2000),

* Third Edition of the Peabody Picture Vocabulary Test (PPVT-III),

* Head Start Quality Research Centers (QRC) letter-naming exercise, 
and: 

* Early Childhood Longitudinal Study of a kindergarten cohort (ECLS-K) 
math assessment. 

Some or all of each test was previously used for other studies, and the 
PPVT and letter naming were previously used in studies of Head Start 
children.[Footnote 5] Three of the four tests were modified from their 
original version, as shown in table 2. Figures 3 and 4 are examples 
from the letter naming and early math skills components of the NRS. 
Figure 5 is an example of the type of item used in the vocabulary 
(PPVT) component of the NRS. 

Table 2: Description of NRS Components and Their Modifications: 

NRS components: Oral Language Development Scale (OLDS) of the PreLAS 
2000 (comprehension of spoken English); 
Modifications to components: NRS includes two subtests from the 
original assessment; 
Description of components: Simon Says-The child is asked to follow the 
instructions that "Simon says," such as "Simon says, 'Touch your 
toes.'"; Art Show- The child is presented with a series of 10 pictures 
and asked to name or explain what is in each picture; 
Legislatively-mandated skill measured by component: Use increasingly 
complex and varied spoken vocabulary; For non-English speaking 
children, demonstrate progress in listening to and understanding 
English; For non-English speaking children, show progress in speaking 
English. 

NRS components: Third Edition of the Peabody Picture Vocabulary Test 
(PPVT-III); 
Modifications to components: NRS includes 24 items from what was 
originally a 144-item test; 
Description of components: The child is asked to point to pictures to 
demonstrate understanding of words representing parts of the human body 
or their functions, activities of daily living, emotions and feelings, 
work/career-related activities, and plants, animals, and their 
habitats; 
Legislatively- mandated skill measured by component: Understand 
increasingly complex and varied vocabulary. 

NRS components: Head Start Quality Research Centers (QRC) letter-naming 
exercise; 
Modifications to components: None; 
Description of components: The child is shown all 26 letters of the 
alphabet, divided into three groups of 8, 9, and 9 letters, and 
arranged in approximate order of item difficulty, and is asked to 
identify the letters they know by name; 
Legislatively-mandated skill measured by component: Identify at least 
10 letters of the alphabet. 

NRS components: Early Childhood Longitudinal Study of a kindergarten 
cohort (ECLS-K) math assessment; 
Modifications to components: NRS includes items in the easier range of 
the original assessment; 
Description of components: Using pictures, the child is asked about a 
range of math skills: number recognition of 1-digit numerals, basic 
geometric shapes, matching number names with objects, counting, simple 
addition and subtraction, and interpreting simple measurements and 
graphic representations; 
Legislatively-mandated skill measured by component: Know numbers and 
operations. 

Source: GAO analysis of HHS documentation. 

[End of table]

Figure 3: Example of NRS Letter Naming Instructions and Task: 

[See PDF for image]

[End of figure]

Figure 4: Example of NRS Early Math Skills Instructions and Task: 

[See PDF for image]

[End of figure]

Figure 5: Example of Type of Vocabulary Instructions and Task Used in 
the NRS: 

[See PDF for image]

[End of figure]

The Head Start Bureau Has Been Responsive to Some Implementation Issues 
Raised during First Year of NRS, but Others Remain: 

HSB has been responsive to some specific implementation concerns about 
the NRS, but other issues remain that might pose problems in the 
future. HSB already has made modifications to NRS training materials, 
the CBRS, and how the Spanish NRS is administered. In addition, HSB is 
working with the Technical Work Group to explore the feasibility of 
adopting a sampling strategy and including a measure of social- 
emotional development in the NRS. HSB has told grantees not to make 
changes to their programs based on the first year of the NRS, but our 
survey found that some grantees have changed instruction to emphasize 
areas covered in the test.[Footnote 6] While some such change may be 
appropriate, HSB currently is not monitoring whether grantees are 
changing the content of instruction to de-emphasize areas not tested or 
adopting inappropriate styles of teaching. 

HSB Has Responded to Some Implementation Issues That Arose during the 
First Year of NRS: 

Based on grantee feedback about their experiences during the first year 
of NRS implementation, HSB has already responded to some concerns by 
providing additional guidance on handling children's behavior, making 
it easier for Head Start staff to use the CBRS, and changing the order 
in which the Spanish and English versions of the NRS are administered 
to Spanish speaking children. These changes are, in part, a response to 
feedback from local assessors and concerns raised by Technical Work 
Group members. During our site visits, some assessors described the 
2003 NRS training as rigid, with a lot of emphasis placed on following 
the script. HSB addressed these concerns in the 2004 spring refresher 
training video. Assessors agreed that this video better reflected the 
situations they encountered when assessing young children, such as a 
child who fidgets, has to go to the bathroom or wants a drink of water 
during an assessment. 

In addition to changing training material, HSB added several new 
features to the CBRS in response to information contractors gleaned 
while fielding assessors' phone calls for technical assistance. For 
example, the CBRS initially required local Head Start staff to type in 
all necessary information about their students, but the fall 2004 
version of the CBRS allowed local staff to update information about 
their children using information from the previous year or by 
transferring information from other computer systems. 

Another change to the NRS is the order in which the Spanish and English 
assessments are administered to Spanish speaking children. Some TWG 
members suggested that by administering the NRS first in English and 
secondly in Spanish to Spanish-speaking children with limited English 
proficiency, the children will have experienced difficulty and 
frustration during the English test. These feelings of frustration or 
failure could affect a child's disposition--and a child's responses-- 
when later taking the Spanish version. Thus, the validity of the 
Spanish assessment might be compromised. During summer 2004, Migrant 
and Seasonal Head Start Programs administered the assessment in Spanish 
first. Based on the positive response they received from local 
assessors, HSB instructed all programs to follow this format in fall of 
2004. 

HSB Is Considering Sampling Strategies and Broadening NRS to Include a 
Measure of Social-Emotional Development: 

HSB is considering ways to deal with two issues raised during the first 
year of implementation: the burden on grantees in dedicating staff for 
the assessments and the limited range of skills that were assessed in 
the NRS. In particular, HSB is considering the feasibility of sampling 
to minimize the burden that grantees experienced in assessing all 4-and 
5-year-old Head Start participants who will attend kindergarten the 
following year. According to our survey, finding time to conduct 
assessments presented at least a moderate challenge to an estimated 63 
percent of grantees and allocating staff to administer the NRS 
presented at least a moderate challenge for an estimated 42 percent of 
grantees during the first year of the NRS. According to most of the 
assessors we spoke to (8 of 12) during our site visits, local staff 
neglected other tasks, juggled tasks, or took work home because they 
were occupied with administering the NRS. Assessors also mentioned 
having to reschedule training and reallocate staff because of the NRS. 

Several Technical Work Groups members and grantees have suggested 
sampling as a way for the NRS to provide better information while 
reducing the burden on grantees. Sampling would allow staff to spend 
more time in the classroom and would cost less. Responding to these 
suggestions, HSB is working with some members of the Technical Work 
Group to identify various sampling strategies and their practical 
implications. These sampling strategies include matrix sampling, which 
involves taking a subset of items from the larger assessment and 
randomly assigning them to test takers, thereby avoiding the need to 
administer all items to all test takers. Matrix sampling would allow 
for more items to be included and, therefore, more in-depth assessment 
of the subjects covered by the test. Drawing an appropriate sample is 
complicated, however, and it might be difficult to learn how subgroups 
are doing, by region or subpopulation, using sampling or matrix 
sampling. 

In addition to studying the feasibility of sampling, HSB is actively 
exploring ways to incorporate a measure of social-emotional development 
into the NRS. Technical Work Group members have argued that social- 
emotional development is critical to kindergarten success and adding a 
measure of social-emotional development would begin to address 
criticisms that the scope of the NRS currently is too narrow. A 
Technical Work Group subcommittee has identified eight measures of 
social-emotional development for possible field-testing. In addition, 
HSB has directed its contractor to conduct a small pilot to assess the 
feasibility of these measures and to conduct focus groups to obtain 
teacher feedback on the measures. Following the pilot test and focus 
groups, the contractor will conduct a field test with 30 Head Start 
programs to determine the appropriateness and technical adequacy of the 
measures. 

HSB Has Not Yet Addressed Some Concerns: 

While HSB is addressing some issues associated with the NRS, additional 
implementation concerns have yet to be addressed. HSB currently lacks 
independent information to verify that grantees are assessing all of 
the children eligible to participate in the NRS. Thus, the potential 
exists for undetected errors or exclusion of children HSB intends to be 
assessed. HSB attempts to ensure it has accurate information in several 
ways. For example, HSB compares the number of 4-and 5-year-olds 
reported in the current year with information from the previous year 
and it analyzes the data for inconsistencies and 
discrepancies.[Footnote 7] However, beyond these checks, HSB does not 
have an independent way to confirm the number of children eligible to 
participate in the NRS. 

There is also a concern that local Head Start programs will alter their 
teaching practices and curricula based on their participation in the 
NRS. These alterations, whether intended or unintended, might have 
positive and negative consequences. Local assessors are generally Head 
Start staff and it is expected that they want their children to perform 
well on the NRS and that they will teach their children the specific 
skills measured in the NRS. An increased focus on teaching these skills 
could be positive to the extent they have been neglected. However, this 
focus would be detrimental if it resulted in narrowing the curriculum 
to exclude skills that are not measured on the NRS but that experts 
believe are equally important for children's development. HSB 
specifically told grantees not to make changes to their programs based 
on their initial NRS results and has provided guidance on appropriate 
instruction. Nonetheless, according to our survey of assessors, at 
least an estimated 18 percent of grantees changed instruction during 
the first year of NRS implementation to emphasize areas covered in the 
NRS. One assessor we interviewed explained that despite being told 
during NRS training that programs should not adjust their curricula, it 
is human nature to try to correct areas in need of improvement. Without 
additional information, it is not possible to determine whether changes 
in instruction are positive or negative. 

Despite HSB's assurances that it intends to use the NRS results only in 
the context of other information on performance, experts state that 
grantees' perception of the NRS as a "high stakes" test could 
compromise the test within a few years. Assessors are very involved in 
the scoring of the NRS, yet the NRS is evaluating the grantees that 
employ them; thus, they are not independent. Assessors' input and 
interpretations could make the grantee appear to accomplish its goals, 
whether it actually does or not. For example, one assessor commented 
that participating in the NRS had planted a seed that perhaps she 
should teach her children particular words that appear in the NRS, such 
as the word "altogether," which appears in the instructions. It is also 
worth noting that the words used to screen for understanding of English 
were exactly the same in fall 2003 and spring 2004, so that learning 
particular words would make a large difference. An independent expert 
argued that there needs to be continuous monitoring and retraining of 
NRS assessors, as there was during the first year of NRS 
implementation, to maintain quality control over the testing process. 
For the second year of the NRS, HSB has extended its effort to review 
the quality of assessment administration, but these efforts do not 
include monitoring of changes in classroom practices. 

Additionally, in the absence of clear direction from HSB, local Head 
Start staff might misinterpret the results and use them 
inappropriately. The Technical Work Group has been clear that NRS 
scores for classrooms and individual children are not reliable and 
should not be used at the classroom level or for individual child 
evaluation or instruction. Yet, two of the Head Start grantees we 
visited stated that they photocopied each child's responses before 
returning the completed scoring sheets and one stated that the grantee 
intended to use the individual test results to evaluate its own 
performance at the classroom level. Technical Work Group members have 
argued that local Head Start programs should be given clear information 
on how to interpret the NRS results and how to improve their programs 
if they are unhappy with their NRS scores; however, the Technical Work 
Group members themselves have expressed confusion about how to 
interpret NRS scores, given the technical issues that are discussed in 
detail in the next section. 

The Head Start Bureau Has Not Specified How NRS Results Will Be Used 
and Important Analyses Remain to Be Done: 

HSB has not said specifically how it will use the NRS results and HSB 
currently lacks analyses showing that the NRS provides the scope and 
quality of information needed to hold Head Start grantees accountable 
or target training and technical assistance. To support these purposes, 
the NRS must produce valid and reliable results on children's 
performance that would allow for clear conclusions about Head Start 
grantees' effectiveness in improving the academic performance of 
children. Due to time constraints, HSB did not conduct a pilot test 
that could have provided information to establish the reliability and 
validity of changes in the NRS results over time. Experts have also 
questioned the technical merit of the Spanish-language NRS. Apart from 
these concerns, the NRS results alone do not provide enough contextual 
information to support accountability decisions. Acknowledging some of 
these issues, HSB has stated that accountability decisions will not be 
based solely on NRS results, and it will consider other grantee 
information, though it has not explicitly described how NRS results 
will be interpreted. Finally, because multiple classrooms are averaged 
to produce grantee results and this average may mask variability among 
different classrooms, NRS results are of limited use to target training 
and technical assistance to the classrooms where assistance is needed 
most. 

Head Start Bureau Has Not Stated How It Will Use NRS Results to Achieve 
Its Purposes: 

Head Start Bureau officials have stated in general terms that they will 
use NRS results to improve program performance, target training and 
technical assistance and hold Head Start grantees accountable; however, 
it remains unclear whether the NRS' purposes will be realized because 
HSB has not explained how assessment results will be used. For example, 
as of February 2005, HSB had not specified what grantee scoring level 
constitutes adequate performance. In addition, it had not indicated 
whether HSB would adjust scores to account for age or other differences 
among the children grantees serve, how it would account for students 
with disabilities, or whether adequate performance would be measured in 
absolute terms (e.g., the average score or the percentage of children 
that score above a certain level) or by growth in performance 
(performance change from fall to spring assessment). 

Professional standards for educational testing require that test 
developers specify how results will be used prior to developing a test 
so that judgments can be made about the appropriateness of the test. 
The specific uses of the NRS dictate the specific technical criteria it 
should meet. For example, if HSB intends to hold grantees accountable 
for increasing their assessment scores by a particular percentage, the 
NRS would need to be sensitive enough to reliably measure increases of 
that size. Several Technical Work Group members have emphasized the 
point that HSB should have determined exactly how it intended to use 
the NRS as a first step in the development of the NRS. As of February 
2005, HSB officials had not indicated when they would make decisions 
about the specific uses of the NRS data or when they would provide this 
information to grantees. 

This ambiguity has left some grantees wondering what the consequences 
could be of their assessment results. Assessors from 6 of the 12 Head 
Start grantees we visited said they were concerned about how HSB would 
use the NRS. Assessors from two grantees expressed apprehension that 
the results would be misinterpreted as evidence regarding the 
effectiveness of the program. One assessor suggested that HSB should 
share with local Head Start staff how it plans to use the data because 
it would generate greater support for the NRS among staff. These 
findings are consistent with recommendations from a quality assurance 
study, commissioned by HSB, that recommended HSB provide more 
information on how it will use the results of the NRS assessments, 
especially with respect to implications for training and technical 
assistance, program improvement, and funding, to alleviate the concerns 
of grantees.[Footnote 8] HSB has stated that it is focusing on how to 
work with grantees on understanding NRS results and how to use the 
information to make improvements through training and technical 
assistance. 

Results from First Year Cannot Be Used to Hold Grantees Accountable 
Because Important Analyses Have yet to Be Completed or Documented: 

In order to use the NRS for the purpose of holding grantees accountable 
for children's progress, HSB needs to demonstrate that the NRS will 
provide reliable and valid information. As of February 2005, HSB had 
not, however, conducted certain analyses on NRS results to establish 
the validity and some aspects of the reliability of the assessment. A 
test is considered valid when it measures what it is supposed to 
measure and evidence supports the intended interpretations of test 
scores for a particular purpose. Reliability refers to whether or not a 
test yields consistent results, meaning that if a child in Head Start 
took the NRS on, say, a different day, that his or her score would be 
similar. 

HSB tested the reliability of particular NRS items through a short 
field test, but given the time constraints on the development of the 
NRS, HSB did not run a more extensive "pilot" test prior to full 
implementation. The field test results provided some information on the 
reliability of the NRS components for one point in time, which 
generally was strong at the grantee level. However, HSB lacked 
information on the range of growth that children might experience over 
the course of a year and--consequently--did not have the data to show 
that the test produces valid and reliable results on change from fall 
to spring. Some assessors also have expressed doubt about whether the 
NRS accurately measures change over time. According to our survey of 
NRS assessors, about a quarter of assessors agree that the NRS 
accurately measures the progress of their Head Start children from fall 
to spring. Further, without additional data from a pilot test, HSB 
could not fully validate the NRS and ensure that its use for the 
intended purposes was appropriate. 

Despite not conducting a pilot test, HSB stated that the NRS was 
technically sound in large part because it borrowed sections from tests 
that produced valid and reliable results in previous studies. Relying 
on this past work instead of conducting a new pilot test allowed HSB to 
develop the NRS within a very short time frame, but there are problems 
with this approach. The sample of children in these past studies is not 
always the same as the Head Start children with regard to age, home 
language, culture, or range of socio-economic status. Moreover, some of 
the tests used in the past were modified for use in the NRS by either 
limiting the questions asked or modifying the instructions. Without 
further analyses of the actual NRS implementation data, it is 
impossible to determine whether interpretations of the NRS results for 
the purpose of accountability are valid. Data from the first year of 
implementation could now be used to conduct some of these analyses and 
make determinations. For this reason, some Technical Work Group members 
have suggested that the first year of NRS implementation should have 
been considered a pilot test. HSB officials stated recently that they 
would be working with the Technical Work Group and a new advisory 
committee to continue to review the quality, reliability, and validity 
of the NRS assessment. 

Technical Work Group members have noted specific concerns with the 
approach and format of the NRS that may be threats to its validity. For 
example, Technical Work Group members have criticized the math section 
for asking children to refer to items pictured on a page rather than 
providing physical items (e.g., blocks) to handle and have argued that 
the instructions are complicated for 4-and 5-year-old children. They 
argue children might fail items due not to lack of math skills, but 
because they do not understand the instructions or they lack the 
ability to perform the math operations without items that can be 
manipulated. Technical Work Group members also questioned whether the 
letter-naming task is a valid measure of how many letters the children 
know. Given the layout of the letters on the page, a child can miss 
letters even if he or she actually knows the names of the letters, or 
may tire of naming them and seek to see what is on the next page. 
Several of the assessors we interviewed echoed these concerns and also 
raised concerns about the quality of the pictures and choice of 
vocabulary used in the PPVT component of the NRS. Due in part to these 
concerns, only about half of lead assessors believe that the NRS 
accurately portrays the majority of their children's abilities. 

Currently, HSB cannot use the results from the Spanish version of the 
NRS for accountability purposes because it has not been demonstrated 
that this version produces reliable and valid results or that its 
results are comparable to those from children tested in English. While 
it is important that a Spanish version was developed due to the fact 
that 20 percent of Head Start children speak Spanish, experts have 
questioned the reliability of the Spanish NRS results and criticized 
other aspects of this version. First, the Spanish version of the NRS 
was not standardized for the Spanish-speaking Head Start population. 
Because the country of origin and class of a child's family affect the 
Spanish dialect he or she speaks, there are important language 
differences among subpopulations, making such standardization 
important. For example, the Spanish spoken in Puerto Rico differs from 
that in Mexico and children from these countries are likely to 
recognize and use different words in test questions and answers. A 
number of NRS assessors commented to us that the Spanish terms used in 
the NRS were unfamiliar to their children and, in some cases, 
unfamiliar to the staff as well. A second problem with the Spanish NRS 
is that the English and Spanish versions are scored differently in that 
English answers are acceptable on the Spanish version, but not vice 
versa. This presents a problem because bilingual children may know some 
things in English and other things in Spanish. For example, a child 
might know the Spanish words for household items and the English words 
for numbers and math concepts. As an indication of this, one-third of 
Spanish-language NRS assessors found that on the Spanish version of the 
NRS many of their children responded correctly in English, but not in 
Spanish. 

Members of the Technical Work Group and experts in bilingual testing 
have also questioned whether the Simon Says and Art Show components of 
the NRS can be used appropriately to track children's progress in 
English, as HSB intends. They express concerns that these components, 
designed simply as a screener to identify children who might have 
difficulty understanding English, do not provide useful information on 
the extent of English understood. 

In addition to addressing concerns about the reliability and validity 
of the NRS directly, it is important that HSB's analyses and results 
are easy for other knowledgeable people to understand and use. 
Professional standards call for a technical manual addressing issues 
such as reliability and validity, as well as clearly specifying the 
intended uses and interpretations of the tests and cautioning against 
unintended misuses. According to all three of the independent experts 
who reviewed the technical aspects of the NRS at our request, the 
documentation of the reliability and validity of the NRS is not as well 
organized as would be desirable.[Footnote 9] They stated that given the 
importance of the validity of the NRS, a technical manual that brings 
all the evidence together in one place would be valuable. The expert 
reviewers reported that, in some cases, relevant material for 
evaluating the procedures and evidence to support the reliability and 
validity was provided, but was not organized in one place. For other 
areas, especially concerning the empirical work related to the Spanish 
version, documentation was not provided. For example, the information 
on the Spanish version of the test was limited to descriptions of 
procedures and summaries (e.g., "reliabilities were in the moderate to 
high range") and did not include documentation that would have made it 
possible for the reviewers to confirm the findings. 

HSB Acknowledges that NRS Alone Does Not Provide Range of Information 
and Context Needed for Making Accountability Decisions: 

The NRS by itself does not provide sufficient information to draw 
conclusions about the effects of Head Start grantees on children's 
outcomes--information that would support use of the NRS for Head Start 
grantee accountability. The NRS does not measure all aspects of Head 
Start, but only a limited range of the areas on which Head Start 
focuses and which contribute to children's school readiness. For 
example, the NRS does not include measures related to science, creative 
arts, approaches to learning, physical health and development, or 
social and emotional development, areas on which all Head Start 
programs are required to focus. Further, the cognitive areas included 
in the NRS are measured using a very narrow source of data that is not 
sufficient to evaluate the effects of Head Start grantees on the full 
range of child outcomes. For the area of literacy, the test measures 
how well children can identify letters, but not whether they can 
recognize rhymes or understand that letters make sounds--both aspects 
of "phonemic awareness," which is believed to be an area critical for 
preventing reading difficulties. For the area of language development, 
the test measures how well children can identify pictures by name, but 
not grammar, usage, or expressive speech. 

The Head Start Bureau has acknowledged the limited scope of the NRS and 
has expressly urged Head Start grantees to continue implementing their 
local assessments of the broader range of Head Start activities. The 
Associate Commissioner for the Head Start Bureau has stated that the 
Bureau does not intend to make decisions about grantees based solely on 
NRS data. Rather, the NRS information will be combined with 
comprehensive program level data collected on program designs and staff 
patterns; funded and actual enrollment; health, education, disability, 
and family services delivered; and demographic, social, and other 
trends.[Footnote 10] Many Technical Work Group Members have stated that 
this type of contextual information is necessary for the NRS to be a 
useful part of an overall program evaluation design. 

In addition to measuring a limited range of the areas on which Head 
Start focuses, the NRS does not include all of the 4-year-old children 
who participate in Head Start. Most notably, children who speak neither 
English nor Spanish, about 4 percent of Head Start children otherwise 
eligible to participate in the NRS, are excluded from the NRS. Some 
grantees do not have such children in their classrooms while others may 
include many such children. In addition, a number of children are 
excluded from the NRS due to prolonged absence and the scores of some 
children who do participate in the NRS are later excluded due to 
administrative reporting errors. 

Application of NRS in Targeting Training and Technical Assistance 
Requires Further Development: 

NRS results are most reliable at the grantee level, but results at the 
grantee level are not the most useful for identifying where training 
and technical assistance should be targeted because some grantees 
include a large number of locations and classrooms. Using average 
scores at the grantee level to target training and technical assistance 
can mask the variability that underlies them. An average score gain for 
a grantee may be accounted for by high gains only of children in 
particular classrooms, while the scores of children in other classrooms 
did not change or actually lost points. The NRS data would allow for 
more effective targeting of training and technical assistance if the 
data could be used at the center and classroom levels, but currently 
the NRS cannot be used in this way. Given this limitation, HSB has 
stated that it might use NRS results to target training to a particular 
region of the country or to support a national training initiative in a 
particular skill area rather than to target specific grantees. 

The NRS, by itself, cannot identify which particular aspects of the 
Head Start program, if any, contributed to a grantee's particular NRS 
results and this imposes some limitations on its utility for targeting 
training and technical assistance. The NRS does not directly assess the 
performance of Head Start grantees, such as by assessing the quality of 
the classroom environment or teacher-child interactions. Rather, the 
NRS assesses children's performance as an indirect measure of grantee 
performance. To ensure that the NRS can be used as a valid indicator of 
grantee performance (vs. variations in student age or other 
characteristics), experts believe it would be important to link NRS 
data to other observations known to distinguish more and less 
successful programs. In its quality assurance study of the NRS, HSB 
found that local Head Start staff were not sure how to use the fall 
2003 results that were reported at the grantee level. Likewise, in our 
survey of NRS assessors we found that almost one-third of assessors 
believed the NRS did not provide useful information for their programs. 

Some members of the Technical Work Group have suggested that HSB 
further investigate the assumption that targeting training and 
technical assistance at the grantee or broader level can affect the 
progress made by children on certain academic skills. They argue that, 
if it is found that the classroom level matters, then the focus of 
analysis and reporting should be redirected and efforts could be made 
to increase the reliability of the scores at the classroom level. 

Conclusions: 

The NRS is an important step toward meeting a long-standing need for 
systematic data on children's progress in Head Start and grantees' 
performance. Developing such a system is a challenging endeavor and 
considerable care and resources have gone into the project so far. At 
the same time, the technical standards applicable to HSB's planned uses 
for the assessment results need to be met. In addition, the system 
should be implemented with the greatest efficiency and caution against 
unintended negative consequences. The current NRS has strengths as well 
as areas in need of refinement, further investigation, and development. 

While the NRS provides some information on child outcomes among Head 
Start grantees, HSB has not yet articulated how it intends to interpret 
and use this information for the purposes of informing decisions about 
Head Start accountability and targeting training and technical 
assistance. Without further guidance, there is confusion among Head 
Start grantees about what level of performance is expected of them and 
how NRS results from their programs might be used to hold them 
accountable. Out of anxiety about potential uses of the test, grantees 
may be inappropriately narrowing the educational activities provided 
through Head Start to match those included in the NRS, even though 
instructed not to do so. Thus far, HSB has not established an ongoing 
mechanism for monitoring the extent to which the NRS has such effects 
on instruction. 

Other key steps that HSB has not taken include validating component 
tests and determining the reliability and validity of the NRS results 
across time. In addition, it has not compiled complete, well-organized 
documentation on the analyses conducted during test development and 
implementation, making it difficult for independent experts to evaluate 
the full technical merits of the English and Spanish versions of the 
NRS. Further, HSB lacks a mechanism for ensuring that all English and 
Spanish-speaking Head Start children who are eligible to participate in 
the NRS are assessed. Without such a mechanism and additional analyses, 
and the assurances they provide, the potential exists that the NRS will 
produce results that are not useful for program evaluation. Moreover, 
without further work on test validation, HSB cannot use the NRS for 
making decisions about grantees. 

Finally, HSB's decision to assess all children with the full NRS 
assessment, rather than assessing a sample of children with a sample of 
items, has created a logistical challenge for many local Head Start 
grantees who must conduct the assessments, and limited the depth of 
information the NRS can provide about the learning of Head Start 
children in particular skill areas. At the same time, developing a 
sampling or matrix sampling strategy is complicated, especially for 
gathering information on the performance of subgroups of grantees, such 
as by region. 

Recommendations for Executive Action: 

To help ensure that the NRS successfully and efficiently achieves its 
purposes, we are recommending that the HHS Assistant Secretary for ACF 
take steps to better monitor some aspects of NRS implementation and 
examine means of improving its efficiency, including steps to: 

* monitor the effects of the NRS on local Head Start instructional 
practices;

* improve the management and accuracy of its data on the number of 
children eligible for and participating in the NRS; and: 

* work with the Technical Work Group to determine the feasibility of 
sampling options for administering the NRS, including documentation of 
their costs and benefits. 

In addition, we are recommending that the Assistant Secretary for ACF 
reduce uncertainty about the appropriate uses of the NRS by taking 
additional steps to: 

* determine how the NRS data will be used for the purposes of 
accountability and targeting training and technical assistance, and 
clearly communicate this information to grantees;

* use the first year of NRS results to conduct further study to ensure 
that the results are reliable and valid for both the English and 
Spanish versions and that the results are appropriate for the intended 
purposes; and: 

* compile detailed technical information on the NRS, including 
appropriate uses, in a single, well-organized document and make this 
information publicly available. 

Agency Comments and Our Evaluation: 

ACF provided written comments on a draft of this report, which are 
reprinted in appendix III. ACF generally agreed with GAO's 
recommendations and stated that it had taken the following actions: 

ACF's contractors are conducting additional analyses of the first year 
NRS results to ensure that future results are reliable and valid. 

ACF's contractors are preparing a detailed technical report. 

ACF has engaged its contractors and TWG in the preparation of an 
options paper with recommendations for sampling. 

ACF is examining changes that occur in local curriculum implementation 
and teaching practices. 

Further, ACF indicated that it will examine ways to improve the 
management and accuracy of its data on the number of children eligible 
for and participating in the NRS. 

ACF's positions regarding the NRS evolved over the course of our 
review, as evidenced by ACF's decision not to include the 2003-2004 NRS 
results in the 2004-2005 program monitoring process, its modification 
of training materials, and changes ACF made to the CBRS. ACF expressed 
in its comments a continued willingness to receive recommendations and 
advice. 

While generally agreeing with our recommendations, ACF also submitted 
detailed comments on certain aspects of the draft report. Several of 
these comments concerned the level of evidence for the validity of the 
NRS. For example, ACF cited ongoing analyses of validity and noted that 
most of the tests in the NRS have been used in other studies. However, 
while further evidence of validity may be forthcoming, the data 
available at the time of our review did not fully document that the 
tests provide for valid inferences about program performance or 
children's progress from fall to spring. If the test is to be used as a 
measure of program performance or to assess changes in child outcomes, 
it is important to ensure that it is sensitive to the range of 
development typically demonstrated in Head Start. Based on our analysis 
and that of the TWG and independent experts, we continue to believe 
that further study is necessary to ensure that the NRS results are 
reliable and valid and that the results are appropriate for the 
intended purposes. 

ACF also commented at length on our finding that, according to our 
survey of assessors, at least an estimated 18 percent of grantees 
"changed instruction during the first year of NRS implementation to 
emphasize areas covered in the NRS." ACF does not dispute that such 
changes were made, but suggests they may be appropriate, which we had 
noted in the draft report. In addition, ACF made a number of technical 
comments that we have incorporated as appropriate. 

We are sending copies of this report to the Assistant Secretary for 
ACF, appropriate congressional committees, and other interested 
parties. We will also make copies available to others upon request. In 
addition, the report will be available at no charge on GAO's Web site 
at http://www.gao.gov. Please contact me at (202) 512-7215 if you or 
your staff have any questions about this report. Other major 
contributors to this report are listed in appendix IV. 

Signed by: 

Marnie S. Shaul: 
Director, Education, Workforce and Income Security Issues: 

[End of section]

Appendix I: Objectives, Scope and Methodology: 

We designed our study to examine (1) what information the National 
Reporting System (NRS) is designed to provide, (2) how the Head Start 
Bureau (HSB) has responded to implementation issues raised by the Head 
Start grantees and experts during the first year of NRS implementation, 
and what issues remain to be addressed, and (3) whether the NRS 
provides HSB with the quality of information it needs to meet its 
goals. We obtained information about these objectives through the 
following methods: 

* Conducted in-person interviews with representatives from HSB, its 
contractors, and early childhood professional organizations. 

* Reviewed documents chronicling the steps HSB took in developing and 
implementing the NRS and delineating the professionally accepted 
standards for test development. 

* Conducted a mail survey of a nationally representative sample of Head 
Start grantees and delegates. 

* Conducted in-person interviews with staff at 12 Head Start programs 
in 5 states. 

* Conducted interviews with all of the members of the Technical Work 
Group. 

* Contracted with individuals recommended by the National Academy of 
Sciences as experts in the areas of psychometrics and the educational 
testing of Spanish-speaking and bilingual children. 

We conducted our work between May 2004 and February 2005 in accordance 
with generally accepted government auditing standards. 

Interviews with Head Start Bureau and Relevant Parties: 

To obtain information on the steps HSB took in developing and 
implementing the NRS, we conducted in-person and/or telephone 
interviews with HSB and its contractors or subcontractors (Westat, 
Mathematica, and Xtria), using semi-structured interview protocols. A 
representative of HSB was present at each of the interviews with its 
contractors. We asked HSB officials' questions about the purpose of the 
NRS, reporting NRS results, revisions and updates to the NRS, reactions 
to NRS critics, and other related matters. We asked Westat staff 
questions regarding: (1) the validity, reliability, and other analyses 
of NRS results; (2) test development and revision; (3) test 
administration, scoring, and reporting; (4) testing individuals of 
diverse linguistic backgrounds; and (5) testing individuals with 
disabilities. We asked Xtria staff about focus groups they conducted, 
Computer-Based Reporting System (CBRS) training, and the CBRS itself. 
We asked Mathematica staff about their Quality Assurance Study 
methodology and findings. 

We interviewed representatives of the National Head Start Association 
(NHSA) to obtain information on what NHSA staff and their members 
learned from the first year of NRS implementation and to obtain their 
opinion on the extent to which the NRS comports with professional 
standards. We interviewed representatives of the National Association 
for the Education of Young Children (NAEYC) to learn how the NRS 
comports with their recommendations for assessing young children. 

Review of Documents: 

To obtain information chronicling the steps HSB took in developing and 
implementing the NRS and information about the quality of the NRS 
results, we reviewed documents provided by HSB and its contractor. 
These documents included, for example, minutes from meetings with the 
Technical Work Group and others, minutes from focus groups, copies of 
informational memos to Head Start grantees on the implementation of the 
NRS, reports of results from field testing, and reports of fall 2003 
NRS results. 

To obtain information on the professionally accepted standards for test 
development, we reviewed the Standards for Educational and 
Psychological Testing, which is sponsored and published jointly by the 
American Educational Research Association, the American Psychological 
Association, and the National Council on Measurement in Education. That 
document provides the preeminent, universally accepted, guidance for 
the development and evaluation of high-quality, psychometrically robust 
assessment instruments. 

Survey of NRS Lead Assessors: 

To obtain information on implementation issues raised by the Head Start 
grantees during the first year of NRS implementation, we drew a 
stratified random probability sample of 472 grantees or delegates from 
a study population of 1,820 grantees or delegates of Head Start 
Programs during the 2003-2004 school year. We selected our sample from 
six strata defined by the total number of Head Start tests administered 
and the number of Head Start tests administered in Spanish in the 2003- 
2004 school year. Ultimately, we received 376 completed questionnaires, 
for an overall response rate of 80 percent. The division of the 
population, the division of the sample, and the division of the 
respondents across the six strata can be found in table 3. Each sampled 
grantee or delegate was subsequently weighted in the analysis to 
represent all the members of the population. 

Table 3: Sample Disposition: 

Stratum number: 1; 
Stratum description: At least 200 tests and at least 100 Spanish tests; 
Total population size: 180; 
Total sample size: 125; 
Number of respondents: 98. 

Stratum number: 2; 
Stratum description: Less than 200 tests and at least 100 Spanish 
tests; 
Total population size: 22; 
Total sample size: 22; 
Number of respondents: 17. 

Stratum number: 3; 
Stratum description: At least 200 tests and between 1 and 99 Spanish 
tests; 
Total population size: 327; 
Total sample size: 90; 
Number of respondents: 80. 

Stratum number: 4; 
Stratum description: Less than 200 tests and between 1 and 99 Spanish 
tests; 
Total population size: 575; 
Total sample size: 98; 
Number of respondents: 77. 

Stratum number: 5; 
Stratum description: At least 200 tests and no Spanish tests; 
Total population size: 171; 
Total sample size: 48; 
Number of respondents: 39. 

Stratum number: 6; 
Stratum description: Less than 200 tests and no Spanish tests; 
Total population size: 545; 
Total sample size: 89; 
Number of respondents: 65. 

Total; 
Total population size: 1,820; 
Total sample size: 472; 
Number of respondents: 376. 

Source: GAO. 

[End of table]

We developed the survey questionnaire and pretested the content and 
format of this questionnaire five times with NRS lead assessors, either 
in-person or on the telephone. During these pretests, we asked the NRS 
assessors whether the questions were clear and unbiased and whether the 
terms contained in the questionnaire were accurate and precise. We made 
changes to the questionnaire based on the pretest results. 
Questionnaires were mailed to the sample of NRS lead assessors in 
August 2004 and follow-up calls were made to those assessors whose 
responses were not received within 2 weeks. 

Because we followed a probability procedure based on random selections, 
our sample of delegates and grantees is only one of a large number of 
samples that we might have drawn. Because each sample could have 
provided different estimates, we express our confidence in the 
precision of our particular sample's results as 95 percent confidence 
intervals. These are intervals that would contain the actual population 
values for 95 percent of the samples we could have drawn. As a result, 
we are 95 percent confident that each of the confidence intervals in 
this report will include the true values in the study population. All 
percentage estimates from our sample have margins of error (that is, 
widths of confidence intervals) of plus or minus 6 percentage points or 
less, at the 95 percent confidence level, unless otherwise noted. 

In addition to sampling errors, the practical difficulties of 
conducting any survey may introduce other types of errors, commonly 
referred to as non-sampling errors. For example, differences in how a 
question is interpreted, the sources of information available to 
respondents, or the characteristics of people who do not respond can 
introduce unwanted variability into the survey results. We included 
steps in both the data collection and data analysis stage to minimize 
such non-sampling errors. For example, a survey specialist in 
combination with subject matter experts designed our questionnaire; the 
questionnaire was pretested with NRS assessors; data entry was verified 
to ensure accuracy; and another computer programmer verified the 
computer programs used for analysis. 

A copy of the survey questionnaire, including overall responses, is 
included in appendix II. 

Site Visits to Head Start Grantees: 

To obtain information on implementation issues raised by the Head Start 
grantees during the first year of NRS implementation, we also conducted 
site visits to 12 Head Start programs in 5 states (Colorado, Maryland, 
Massachusetts, Rhode Island, and Virginia), where we interviewed staff 
who conducted the assessments and, in some cases, observed them 
administering the NRS to children. The states and grantees chosen for 
site visits were judgmentally selected to include a range of enrollment 
sizes, types of program, rural and urban locations, and ethnic and 
racial populations. 

The interviews were conducted using a semistructured interview guide 
that included questions about preparation for and logistics of 
administering the assessment; experiences of conducting the 
assessments; effects of the NRS on the children and program; reactions 
to the NRS results; use of the CBRS; other assessment measures in use 
at the program; and contextual information about the program and 
community. During our site visits, we spoke with the lead assessor and, 
in some cases, other Head Start staff, including other assessors, 
staff, and managers. With the exception of sites in Colorado, we 
conducted our site visits during May and June of 2004. We conducted our 
Colorado site visits during September 2004. In all cases, we asked the 
staff to refer to experiences during the 2003-2004 school year. We 
cannot generalize our site visit findings beyond the 12 sites we 
visited, but we have used these data for illustrative purposes in 
conjunction with our survey. 

Interviews with Technical Work Group: 

To obtain information on whether the NRS provides HSB with the quality 
of information it needs to meet its goals, we conducted telephone 
interviews with each of the 16 members of the Technical Work Group, 
using a semi-structured interview protocol. We asked the members about 
their professional backgrounds and involvement on the Technical Work 
Group; their understandings of the purpose of the NRS; their 
assessments of the completeness of the steps HSB took in developing and 
implementing the NRS; their assessments of the extent to which the NRS 
is reliable, valid, and consistent with professional standards; 
specific concerns about the NRS that members had raised during 
Technical Work Group meetings; and their opinions on how HSB should 
proceed with regard to the NRS. Each of the members stated that he or 
she could be candid in discussing these issues with GAO. We also 
observed two meetings of the Technical Work Group in May and October 
2004. 

Technical Work Group Members: 

Craig Ramey, Ph.D., Chairman: 
Distinguished Professor of Health Studies and Director, Georgetown 
University Center for Health Education: 
School of Nursing and Health Studies: 
Georgetown University: 
Washington, D.C. 

Clancy Blair, Ph.D., Co-Chairman: 
Assistant Professor: 
Human Development and Family Studies: 
Pennsylvania State University: 
University Park, Pa. 

Jason L. Anthony, Ph.D., Ed.S.: 
Research Assistant Professor: 
Texas Institute for Measurement, Evaluation, and Statistics: 
Department of Psychology: 
University of Houston: 
Houston, Tex. 

Margaret Burchinal, Ph.D.: 
Senior Scientist: 
Frank Porter Graham Child Development Institute: 
The University of North Carolina at Chapel Hill: 
Chapel Hill, N.C. 

Richard Clifford, Ph.D.: 
Senior Scientist: 
Frank Porter Graham Child Development Institute: 
The University of North Carolina at Chapel Hill: 
Chapel Hill, N.C. 

Linda Espinosa, Ph.D.: 
Associate Professor: 
311D Townsend Hall: 
College of Education: 
University of Missouri-Columbia: 
Columbia, Mo. 

Nicholas Ialongo, Ph.D.: 
Associate Professor: 
Bloomberg School of Public Health: 
Johns Hopkins University: 
Baltimore, Md. 

Graciela Italiano-Thomas, Ed.D.: 
CEO: 
Centro de la Familia de Utah: 
South Salt Lake, Utah: 

Jacqueline Jones, Ph.D.: 
Director, Initiatives in Early Childhood and Literacy Education: 
Educational Testing Service: 
Princeton, N.J. 

Ann P. Kaiser, Ph.D.: 
Professor of Psychology and Human Development: 
Director, Research Program on Communication, Cognitive, and Emotional 
Development: 
Vanderbilt University: 
Nashville, Tenn. 

Samuel J. Meisels, Ed.D.: 
President: 
Erikson Institute: 
Chicago, Ill. 

Fred Morrison, Ph.D.: 
Professor: 
Department of Psychology: 
University of Michigan: 
Ann Arbor, Mich. 

Robert C. Pianta, Ph.D.: 
Professor, William Clay Parrish, Jr. Chair in Education: 
Curry Programs in Clinical and School Psychology: 
University of Virginia: 
Charlottesville, Va. 

Kyle Snow, Ph.D.: 
National Institute of Child Health and Human Development: 
National Institutes of Health: 
U.S. Department of Health and Human Services: 
Bethesda, Md. 

W. Douglas Tynan, Ph.D., ABPP: 
Associate Professor of Pediatrics: 
Alfred I. duPont Hospital for Children: 
Jefferson Medical College: 
Wilmington, Del. 

Jane Wiechel, Ph.D.: 
Associate Superintendent: 
Center for Students, Families and Communities: 
Ohio Department of Education: 
Columbus, Ohio: 

Expert Reviews: 

To obtain information on whether the NRS provides HSB with the quality 
of information it needs to meet its goals, we contracted with 
individuals recommended by the National Academy of Sciences (NAS) as 
experts in the areas of psychometrics and the educational testing of 
Spanish-speaking and bilingual children. These independent experts 
reviewed documents provided by HSB and its contractors and provided 
written comments on the adequacy and appropriateness of the assessment. 
We also conducted follow-up telephone interviews with each of the three 
experts to reconcile variations in their written reviews. We developed 
our own conclusions based on the information provided by these experts. 
The three experts are listed below. 

Ronald K. Hambleton, Ph.D.: 
Distinguished University Professor for Research and Evaluation Methods: 
University of Massachusetts at Amherst: 
School of Education: 
Center for Educational Assessment: 
Amherst, Mass. 

Luis M. Laosa, Ph.D.: 
Principal Research Scientist, Emeritus: 
Educational Testing Service: 
Center for Education Policy and Research: 
Princeton, N.J. 

Robert L. Linn, Ph.D.: 
Professor: 
University of Colorado: 
Department of Education: 
Boulder, Colo. 

[End of section]

Appendix II: Survey Instrument: 

The survey instrument displayed here includes the population estimates 
for grantees overall. The confidence intervals for these estimates do 
not exceed plus or minus 6 percentage points. 

[See PDF for image]

[End of survey]

[End of section]

Appendix III: Comments from the Department of Health and Human 
Services: 

DEPARTMENT OF HEALTH AND HUMAN SERVICES:

ADMINISTRATION FOR CHILDREN AND FAMILIES: 
Office of the Assistant Secretary, 
Suite 600: 
370 LEnfant Promenade, S.W.
Washington, D.C. 20447:

APR 20 2005:

Ms. Marnie S. Shaul:
Director, Education, Workforce and Income Security Issues:
U.S. Government Accountability Office: 
441 G. Street, N. W.
Washington, D.C. 20548:

Dear Ms. Shaul:

The Administration for Children and Families appreciates the 
opportunity to provide comments on recommendations in the U.S. 
Government Accountability Office's draft report entitled, "Head Start: 
Further Development Could Allow Results of New Test to be Used for 
Decisionmaking" (GAO-05-343).

Should you have questions regarding our comments, please contact Windy 
Hill, Associate Commissioner of the Head Start Bureau, Administration 
on Children, Youth and Families, at (202) 205-8573.

Sincerely,

Signed by: 

Wade F. Horn, Ph.D. Assistant Secretary for Children and Families:

Attachment:

COMMENTS OF THE ADMINISTRATION FOR CHILDREN AND FAMILIES ON THE 
GOVERNMENT ACCOUNTABLITY OFFICE'S DRAFT REPORT TITLED, "HEAD START: 
FURTHER DEVELOPMENT COULD ALLOW RESULTS OF NEW TEST TO BE USED FOR 
DECISIONMAKING" (GAO-05-343):

The Administration for Children and Families (ACF) appreciates the 
opportunity to comment on this Government Accountability Office (GAO) 
draft report. We appreciate the breadth of contact made in the 
preparation of this report.

GAO Recommendations:

To help ensure that the NRS successfully and efficiently achieves its 
purposes, we are recommending that the HHS Assistant Secretary, for ACF 
take steps to better monitor some aspects of NRS implementation and 
examine means of improving its efficiency, including steps to:

* monitor the effects of the NRS on local Head Start instructional 
practices;

* improve the management and accuracy of its data on the number of 
children eligible for and participating in the NRS; and:

* work with the Technical Work Group to determine the feasibility of 
sampling options for administering the NRS, including documentation of 
their costs and benefits.

In addition, we are recommending that the Assistant Secretary for ACE 
reduce uncertainty about the appropriate uses of the NRS by taking 
additional steps to:

* determine how the NRS data will be used for the purposes of 
accountability and targeting training and technical assistance, and 
clearly communicate this information to grantees;

* use the first year of NRS results to conduct further study to ensure 
that the results are reliable and valid for both the English and 
Spanish versions and that the results are appropriate for the intended 
purposes; and:

* compile detailed technical information on the NRS, including 
appropriate uses, in a single, well-organized document and make this 
information publicly available.

ACF Comments:

ACF has widely publicized its commitment, need and intent for 
improvements in the implementation of the National Reporting System 
(NRS), including child assessment. We believe that the GAO 
recommendations mirror many of ACF's public statements, as well as 
accurately describe some of the action steps that are already in 
process.

The remaining GAO recommendations are also in keeping with those 
arising from our internal planning with the NRS contractors, the local 
programs and the Technical Work Group (TWG). Additionally, the 
Secretary of HHS will also be receiving recommendations from the newly 
formed Secretary's Advisory Committee (SAC) on Head Start 
Accountability and Educational Performance Standards, which will begin 
meeting this summer.

Specific comments related to the recommendations:

* ACF has already included a scheduled deliverable within the scope of 
work of the NRS contractors. Additional analyses are continuing to be 
conducted with the first year NRS results in order to ensure that 
future results are reliable and valid, and in order to be confident 
that the results are appropriate for the interim and final intended 
purposes. TWG and SAC will both assist ACF in the review of these 
analyses.

* ACF has included tasks that will result in the NRS contractors 
preparing a detailed technical report to expand beyond what is already 
included in the recently distributed "Report to Congress on Head Start 
Assessment." The new work is already in progress. We will make some 
version of the new document available to the public when it is cleared 
by ACF.

* ACF will examine ways to improve management regarding NRS 
participation. We believe that we can achieve this through the existing 
Computer-Based Reporting System data collection, data management, the 
quality assurance site visits, and as part of our overall 
responsibility for program monitoring.

* Prior to the release of the GAO report, ACF had engaged the NRS 
contractors and TWG in the preparation of an options paper with 
recommendations for sampling, including not only the benefits and cost 
implications for each approach but also what could or must be "given 
up" under the implementation of each approach. TWG and SAC will have a 
role in reviewing these recommendations and further advising ACF and 
HHS, respectively.

* ACF is examining and will continue to examine changes that occur in 
local curriculum implementation and teaching practices through at least 
three primary methods: on-site federal reviews, regular periodic 
contact of an assigned technical assistance liaison and the NRS quality 
assurance site visits.

Other Comments:

* ACF would like the title as well as pertinent references throughout 
the document to refer to the NRS rather than "the test." The child 
assessment alone is not synonymous with NRS.

Though mentioned, ACF believes that the Year One Quality Assurance 
Study lacked attention in this report.

Page 4, first full paragraph, and page 23, third paragraph - GAO states 
that HSB has asserted the validity and reliability of NRS measures 
because NRS borrows certain materials from existing tests that have met 
the validity and reliability criteria, but the agency has not shown NRS 
itself to be valid or reliable over time. Reliability and concurrent 
and predictive validity of the Head Start NRS measures were calculated 
using the Family and Child Experiences Survey (FACES) and other data on 
Head Start children. These results were included in the package of 
materials provided to GAO.

Ongoing analyses are being conducted to further demonstrate the 
reliability and validity of the NRS assessment data. For example, 
analyses comparing matched FACES data with NRS data are being conducted 
to validate the assessment parallel data collected by locally trained 
NRS assessors with those collected by trained, experienced, 
professional FACES data collectors. Preliminary analyses indicate that 
little difference is found between the two data sets.

Most of the subtests in the NRS battery have been used extensively in 
the Head Start FACES study, in the National Head Start Impact Study or 
in the Head Start Quality Research Center intervention studies 
involving more than 10,000 Head Start children, as well as in other 
major studies of low-income preschoolers. These measures have been used 
in the National Institute of Child Health and Human Development 
studies, the "Mother & Child Supplement" to the National Longitudinal 
Survey of Youth" and in the "Child Development Supplement" to the Panel 
Study of Income Dynamics. The results of these assessments have proved 
to be highly stable from cohort to cohort, not only in terms of the 
level of achievement with which children enter or leave the Head Start 
program, but also in terms of their growth trajectories.

Analysis of longitudinal data from the Head Start FACES study has shown 
that vocabulary and letter-recognition assessments given in Head Start 
can account for nearly half of the variance in children's tested 
reading skills at the end of kindergarten, and 66 percent of the 
variance when tested in general knowledge at the end of kindergarten. 
Also, scores gained from vocabulary and letter-recognition assessments 
account for almost one-third of the variance in kindergarten reading 
skills and over one-quarter of the variance in kindergarten general 
knowledge.

Page 9, Figure 2 - ACF would like to see the report contain both a 
narrative and a timeline on NRS for the year 2004, not just for 2002 
and 2003 as is currently in the report. The activities of the GAO 
occurred during 2004, as did the first full year of ACF's 
implementation of NRS.

Page 11, first paragraph - GAO indicates that a true "pilot," rather 
than the summer field test of NRS, would take about a year to complete. 
ACF believes that by further:

examination of the Year I data, we will have data even beyond the scope 
of a one-year pilot effort.

The GAO report also states that HSB did not conduct a "full pilot 
test." The Head Start Bureau (HSB) conducted a field test of the NRS 
child assessment in the spring of 2003 with a national probability 
sample of 36 Head Start programs, including two migrant programs and 
two American Indian programs, resulting in a field test sample of over 
1,430 kindergarten-eligible English-and Spanish-speaking children. The 
results of the field test showed that the measures were appropriate for 
the Head Start population, capturing a range of ability levels in the 
assessments domains. Year I implementation results will add 
significantly to this information and what we know about the properties 
of the assessment over time.

Page 21, first paragraph - Though GAO has included a footnote to 
explain, "...actions taken by the Head Start Bureau's contractors are 
attributed to the Head Start Bureau itself," this note appears on this 
page long after readers can attribute actions to HSB. Since the report 
is written without disclosing what actions were taken or advised by 
whom, ACF would like the footnote to be moved to the beginning of the 
report or described in the opening narrative.

Page 26, third paragraph - GAO uses a figure of 13 percent to describe 
the number of children who speak neither English nor Spanish. Aggregate 
Program Information Report data indicate that programs reported 95 
percent of the children enrolled last year spoke either English or 
Spanish, leaving 5 percent who speak other languages. The number of 
children in NRS who spoke a language other than English or Spanish at 
home, as reported in the Computer-Based Reporting System, was 
approximately 4 percent or 19,000 in the fall of 2003.

HSB has two other concerns with the report. Our responses to these two 
are rather lengthy to help clarify them:

1. Page 17, first paragraph - The program office is concerned with the 
following statement in the report "... some grantees have changed 
instruction to emphasize areas covered in the test." The manner in 
which it is stated implies that this can only be negative and that it 
can only be attributable to NRS in any program in which it occurs. On 
the contrary, we believe this illustrates a powerful positive change, 
inasmuch as Head Start's heavy emphasis on instructional and curricular 
changes pre-date the implementation of NRS by several years. We explain 
our concern in detail.

As this country's largest and only federally funded, comprehensive 
early childhood program, we have learned a great deal from research- 
based practices that enhance young children's learning and development. 
Unless we ensure that programs are providing meaningful and challenging 
learning experiences through ongoing observation and assessment of 
children's progress as required by the Program Performance Standards, 
participation will have little value for children. Therefore, we are 
not surprised to learn that local programs reported to GAO that they 
are making changes in their curriculum and in their teaching practices. 
We believe that NRS may be:

giving them additional data upon which they are making such local 
decisions, rather than NRS serving as the sole source of such 
information upon which to base change decisions. We have, through 
various methods, specifically cautioned programs not to take actions of 
this nature. We believe that most programs are not using NRS Year I 
reporting in inappropriate ways.

The GAO report acknowledges in a small way that prior work has occurred 
in this area, yet GAO does not acknowledge that the prior work, rather 
than NRS alone, may be producing changes in curriculum and instruction. 
Prior to the NRS, the Head Start Child Outcomes Framework (Framework) 
defined the comprehensive nature of child development and early 
childhood education in Head Start by including the domains of. language 
development, literacy, mathematics, science, creative arts, social and 
emotional development, approaches to learning, and physical 
development. This focus across all domains must remain within the local 
curriculum and within the local ongoing assessment.

Additionally, the Head Start Program Performance Standards require that 
all of these areas of development be supported through age-appropriate 
curriculum delivered through classroom or home-based programming with 
the integral involvement of parents. Therefore, the focus across all 
domains must remain within the local curriculum and within the local 
ongoing assessment.

ACF has been offering and continues to offer training, technical 
assistance and other resources to help programs look more closely at 
their local implementation and to make necessary changes. Additionally, 
some programs have made and others are actively engaged in making these 
types of changes as a result of either their required program self- 
assessment or local aggregation of child outcome data, and/or as a 
result of noncompliance or deficiencies identified and reported in the 
process of triennial monitoring. We recognize and applaud programs that 
are actively engaged in making appropriate changes in the areas of 
curriculum, ongoing assessment of child progress and early childhood 
instruction across domains.

Another example of our work that is influencing changes in local 
programs is the Head Start Leaders Guide to Positive Child Outcomes. 
This resource is based on the requirements of the Head Start Program 
Performance Standards and the Framework. This important document has 
been the basis of Head Start training, providing staff with specific 
strategies to strengthen curriculum and to foster children's progress 
in each of the identified domains. These strategies assist program 
staff in strengthening curriculum planning and implementation 
regardless of the specific curriculum used in individual programs.

Both ACF's regulations and resource materials provide examples of 
educational quality based on:

* intentional teaching;

* outcomes-oriented learning experiences;

* child engagement; and:

* challenging learning opportunities for small groups of children and 
for individual children.

2. Page 7, second paragraph - The GAO report states of non-NRS 
assessments, "The assessments occur 3 times each year and generally 
involve observing the children during normal classroom activities." 
This statement, though perhaps stated by one or more local programs, 
inaccurately describes grantee actions as related to two existing Head 
Start requirements. The first is the long-standing requirement for 
ongoing observations and ongoing assessment of each child's progress. 
Therefore, observing or assessing progress only three times a year 
would be a significant area of noncompliance, and more likely, a 
deficiency in that program. The statement on page seven further 
represents a misunderstanding and, therefore, an inappropriate 
implementation of the existing requirement. Three times per year each 
agency is required to aggregate, report and examine data from its 
locally designed and locally administered ongoing assessment of child 
progress. This is different from assessing children three times a year.

Head Start standards do not allow for "assessing three times per year"; 
rather, teachers must observe and record examples of children's 
development and learning on an ongoing basis throughout the year. 
Management requirements have programs review aggregate data from the 
assessment at three points in time during the year--the beginning, 
midpoint and the end. The information is reviewed program-wide, in 
aggregate, to assess children's status and progress on a wide range of 
areas identified in the Framework. This information is used to continue 
to plan the educational program for children as well as to inform the 
overall program assessment and planning process.

We are aware that NRS is providing an additional way for programs to 
look at children's progress over the course of a Head Start year. This 
may be contributing to a renewed focus on becoming more intentional and 
more deliberate regarding the early childhood educational services in 
local Head Start programs--the learning content, intentional teaching, 
and children's school readiness in the areas of both the Framework and 
the 1998 Congressionally mandated child outcomes.

As we look more closely at this type of change in local programs, we 
hope that we will be able to conclude that NRS is not currently the 
"cause" of the more intentional focus on school readiness, but rather 
that necessary changes are the result of a number of other factors, 
including:

* The 1998 Congressional mandate, specifying additional Program 
Performance Standards in language, literacy and numeracy/early 
mathematics and the subsequent Framework; The increased qualifications 
of teachers and the significant number with degrees;

* The increased focus on intentional teaching strategies shared through 
training based on research;

* The appropriate use of local outcomes data (not the NRS data); The 
appropriate use of the required program self-assessment;

* Information from research, including the finding that children's pre- 
school vocabulary is the best predictor of school success, and:

* Individual agency and grantee responses to findings from federal, on- 
site and triennial monitoring of compliance with all applicable laws 
and regulations.

HSB's emphasis on instructional change clearly pre-dates NRS, which was 
launched in 2002.

As stated earlier, separate from and prior to NRS, the Framework 
defined the comprehensive nature of child development and early 
childhood education in Head Start. Additionally, the Head Start Program 
Performance Standards require that areas of development be supported 
through age-appropriate curriculum delivered through classroom or home- 
based programming with the integral involvement of parents.

It is important to recognize that both the Head Start Program 
Performance Standards, which were initially issued in 1972 and revised 
in 1996, and the Framework issued in 2000, all pre-date NRS.

The 1998 reauthorization of the Head Start Act (The Act) requires the 
Secretary of HIS to establish "education performance standards to 
ensure the school readiness of children participating in Head Start," 
including assurances that children develop phonemic, print and 
numeracy/early mathematics skills; understand and use language to 
communicate, understand and use increasingly complex and varied 
vocabulary; develop and demonstrate an appreciation of books; and for 
English language learners, progress toward acquisition of the English 
language. The Act also required that the Head Start teacher 
qualifications be raised because of evidence that links classroom and 
teaching quality to the skills, knowledge and formal education of 
teachers.

Therefore, the Act, the Head Start Leaders Guide to Positive Child 
Outcomes, the Framework and the Program Performance Standards, as well 
as professional development experiences such as Mentor Coaching, all 
hold programs and local staff accountable for use of specific 
strategies to strengthen curriculum content, learning outcomes and 
intentional teaching, and to foster children's progress in each child 
development domain of the comprehensive Head Start program.

Ensuring developmentally appropriate programming provides a meaningful 
basis for observing and assessing children's progress and promoting and 
individualizing learning and development. NRS is providing an 
additional form of assessment reporting and an additional and renewed 
focus on local programs becoming more intentional and more deliberate 
regarding curriculum content, intentional teaching and children's 
school readiness, and is not the sole source or a source to replace 
existing requirements for local Head Start agencies.

ACF looks forward to additional recommendations as we move toward the 
use of NRS data and as we inform grantees and others about the use of 
the NRS data as another tool for accountability and providing training 
and technical assistance. 

[End of section]

Appendix IV: GAO Contacts and Staff Acknowledgments: 

GAO Contacts: 

Betty Ward-Zukerman (202) 512-2732, wardzukermanb@gao.gov; 
Heather McCallum Hahn (202) 512-2890, mccallumh@gao.gov: 

Staff Acknowledgments: 

Ramona Burton, Scott Heacock, Kathryn Rooney, Carolyn Boyce, Curtis 
Groves, Stu Kaufman, Joan Vogel, and Sid Schwartz made significant 
contributions to this report. 

FOOTNOTES

[1] Head Start regulations require that at least 90 percent of the 
children enrolled in Head Start come from families with incomes at or 
below the federal poverty guidelines, receiving public assistance, or 
caring for a foster child. In 2004, the federal poverty guideline for a 
family of four in the 48 contiguous states and the District of Columbia 
was $18,850. 

[2] See GAO, Head Start: Challenges in Monitoring Program Quality and 
Demonstrating Results, GAO/HEHS-98-186 (Washington, D.C.: June 1998), 
and Head Start: Curriculum Use and Individual Child Assessment in 
Cognitive and Language Development, GAO-03-1049 (Washington, D.C.: 
September 2003). 

[3] According to ACF officials, in addition to the assessments 
conducted as part of the Head Start Child Outcomes Framework, Head 
Start teachers must observe and record examples of children's 
development and learning on an ongoing basis throughout the year. 

[4] Analyses and actions taken by the Head Start Bureau's contractors 
are attributed to the Head Start Bureau itself. 

[5] Both the OLDS and the math assessment were used in the ECLS-K, and 
the PPVT-III was used with two cohorts of the Head Start Family and 
Child Experiences Survey (FACES). The Head Start Quality Research 
Centers letter-naming exercise was developed for use in Head Start 
curriculum studies. The ECLS-K is an ongoing study that focuses on 
children's early school experiences beginning with kindergarten and 
following children through fifth grade. FACES is a national 
longitudinal study of the development of Head Start children, their 
families, and Head Start programs and staff in a small sample of 
programs. 

[6] We use the terms "the test" and "the assessment" to make shortened 
reference to the NRS test battery. The NRS also incorporates a support 
infrastructure for the test battery, including a system for training 
staff to conduct the assessments and a computer-based reporting system. 
While the NRS may eventually be expanded to incorporate additional 
components, we examined it as implemented through spring 2004. 

[7] The current year's data are not available until December. 

[8] The Head Start Bureau awarded a contract to Mathematica Policy 
Research, Inc., to conduct an implementation study of the NRS in a 
randomly-selected set of 35 Head Start programs. The research team 
observed a total of 119 local assessors, interviewed Head Start 
directors, NRS trainers, and data managers, and held focus groups with 
staff conducting the assessments to learn about their experiences. 
Mathematica also planned to visit four Migrant and Seasonal Head Start 
programs during spring 2004 and fall 2005. 

[9] See appendix I for a list of the expert reviewers and their 
affiliations. 

[10] See GAO, Head Start: Comprehensive Approach to Identifying and 
Addressing Risks Could Help Prevent Grantee Financial Management 
Weaknesses, GAO-05-176 (Washington, D.C.: Feb. 28, 2005). 

GAO's Mission: 

The Government Accountability Office, the investigative arm of 
Congress, exists to support Congress in meeting its constitutional 
responsibilities and to help improve the performance and accountability 
of the federal government for the American people. GAO examines the use 
of public funds; evaluates federal programs and policies; and provides 
analyses, recommendations, and other assistance to help Congress make 
informed oversight, policy, and funding decisions. GAO's commitment to 
good government is reflected in its core values of accountability, 
integrity, and reliability. 

Obtaining Copies of GAO Reports and Testimony: 

The fastest and easiest way to obtain copies of GAO documents at no 
cost is through the Internet. GAO's Web site ( www.gao.gov ) contains 
abstracts and full-text files of current reports and testimony and an 
expanding archive of older products. The Web site features a search 
engine to help you locate documents using key words and phrases. You 
can print these documents in their entirety, including charts and other 
graphics. 

Each day, GAO issues a list of newly released reports, testimony, and 
correspondence. GAO posts this list, known as "Today's Reports," on its 
Web site daily. The list contains links to the full-text document 
files. To have GAO e-mail this list to you every afternoon, go to 
www.gao.gov and select "Subscribe to e-mail alerts" under the "Order 
GAO Products" heading. 

Order by Mail or Phone: 

The first copy of each printed report is free. Additional copies are $2 
each. A check or money order should be made out to the Superintendent 
of Documents. GAO also accepts VISA and Mastercard. Orders for 100 or 
more copies mailed to a single address are discounted 25 percent. 
Orders should be sent to: 

U.S. Government Accountability Office

441 G Street NW, Room LM

Washington, D.C. 20548: 

To order by Phone: 

Voice: (202) 512-6000: 

TDD: (202) 512-2537: 

Fax: (202) 512-6061: 

To Report Fraud, Waste, and Abuse in Federal Programs: 

Contact: 

Web site: www.gao.gov/fraudnet/fraudnet.htm

E-mail: fraudnet@gao.gov

Automated answering system: (800) 424-5454 or (202) 512-7470: 

Public Affairs: 

Jeff Nelligan, managing director,

NelliganJ@gao.gov

(202) 512-4800

U.S. Government Accountability Office,

441 G Street NW, Room 7149

Washington, D.C. 20548: