Findings from the
                                     Replication of an
                                      Evidence-Based
                                      Teen Pregnancy
                                           Prevention
                                             Program

Evaluation of the
 Teen Outreach                         June 26, 2015
           ®
  Program in
Hennepin County,
      MN
       Final Impact Report for

   Hennepin County Human Services
    and Public Health Department


            Prepared by

       Kimberly Francis, PhD
       Michelle Woodford, MS
        Meredith Kelsey, PhD

           Abt Associates
          55 Wheeler Street
        Cambridge, MA 02138
Recommended citation:

Francis, K., Woodford, M., and Kelsey, M. (2015). “Evaluation of the Teen Outreach Program in Hennepin County, MN:
Findings from the Replication of an Evidence-Based Teen Pregnancy Prevention Program.” Cambridge, MA: Abt Associates
Inc.


Acknowledgements

We would like to recognize a number of people who made this evaluation possible. First, we thank Hennepin County Human
Services and Public Health Department staff Katherine Meerse and Lorie Alveshere for their ongoing support and dedication
to the evaluation. This close partnership was integral to the successful execution of the study. We would like to acknowledge
the staff of Annex Teen Clinic, The Family Partnership, and Northpoint Health and Wellness who not only implemented the
program, but participated in interviews and often served as study liaisons to the schools. We also thank the many students,
teachers, and school administrators who participated in the evaluation and without whom this contribution to the field would
not have been possible.

Also contributing to this report from Abt Associates are: Rob Olsen, who contributed to the evaluation design, provided
guidance on the impact analysis, and commented on earlier drafts of the report, Fatih Unlu, who provided support on the
impact analysis, and Rachel Luck, who assisted with data collection and qualitative analysis for the implementation study.


             The Authors


                  This publication was prepared under Grant Number TP1AH000078 from the
                  Office of Adolescent Health, U. S. Department of Health &amp; Human Services
                  (HHS). The views expressed in this report are those of the authors and do
                  not necessarily represent the policies of HHS or the Office of Adolescent
                  Health.
                                                                                                                                     CONTENTS

Table of Contents


Abstract ................................................................................................................................................ iv

1.          Introduction ............................................................................................................................. 1
            1.1      Research Questions.......................................................................................................... 2

2.          Intervention and Comparison Programming ....................................................................... 4
            2.1      Intended Program Content ............................................................................................... 4
            2.2      Intended Program Delivery and Setting .......................................................................... 5
            2.3      The Counterfactual Condition ......................................................................................... 7

3.          Study Design ............................................................................................................................ 8
            3.1      Sample Recruitment ........................................................................................................ 8
            3.2      Random Assignment...................................................................................................... 10
            3.3      Data Collection .............................................................................................................. 12
                     3.3.1        Impact Evaluation ............................................................................................. 12
                     3.3.2        Implementation Study....................................................................................... 12
            3.4      Outcomes for Impact Analyses...................................................................................... 13
            3.5      Creation of the Analytic Sample.................................................................................... 14
            3.6      Baseline Equivalence..................................................................................................... 16
            3.7      Analytic Approach......................................................................................................... 18
                     3.7.1        Implementation Study....................................................................................... 19

4.          Study Findings ....................................................................................................................... 21
            4.1      Implementation Study Findings..................................................................................... 21
                     4.1.1        Adherence to Program Model........................................................................... 21
                     4.1.2        Quality of Implementation................................................................................ 22
                     4.1.3        Experiences of the Control Group .................................................................... 23
                     4.1.4        Context ............................................................................................................. 23
            4.2      Impact Study Findings ................................................................................................... 24
                     4.2.1        Secondary Research Questions ......................................................................... 25

Conclusion ........................................................................................................................................... 27

5.          References .............................................................................................................................. 31

6.          Appendices ............................................................................................................................. 32
            Appendix A: Data Collection Efforts ...................................................................................... 33
            Appendix B: Implementation Study Data Sources .................................................................. 35


Abt Associates                                                                                                                    June 26, 2015 ▌i
                                                                                                                CONTENTS

       Appendix C: Study Sample Flow ............................................................................................ 37
       Appendix D: Implementation Study Methods ......................................................................... 39
       Appendix E: Summary of Sensitivity Tests ............................................................................ 41
       Appendix F: Equation for Estimating Baseline Equivalence .................................................. 45
       Appendix G: Impact Model Specification............................................................................... 46
       Appendix H: Non-Response Weights ..................................................................................... 48
       Appendix I: Approaches to Inconsistent Survey Responses ................................................... 51
       Appendix J: Prevalence of Missing Baseline Covariates ........................................................ 52
       Appendix K: Receipt of Sexual Health Information at Follow-Up ......................................... 53


Abt Associates                                                                                               June 26, 2015 ▌ii
                                                                                                                               CONTENTS

List of Tables


Table 3.1. Behavioral outcomes used for primary and secondary research questions ......................... 14
Table 3.2. Summary statistics of key baseline measures for students responding to the short-term
           follow-up survey................................................................................................................. 16
Table 3.3 Summary statistics of key baseline measures for students responding to the long-term
           follow-up survey................................................................................................................. 17
Table 4.1 Crosstab of weekly session attendance by CSL hours completed…...…………………..22
Table 4.2. Estimated effect using data from short-term survey to address the primary research
           question .............................................................................................................................. 24
Table 4.3. Estimated effects using data from the short and long-term surveys to address secondary
           research questions............................................................................................................... 25
Table A.1. Outcome of teacher recruitment effort (Cohorts 1 and 2 pooled)....................................... 33
Table A.2. Timing of data collection efforts used in the impact analysis of TOP® ............................. 33
Table A.3. Summary of data collection procedures used in the impact analysis of TOP®................... 34
Table B.1 Data used to address implementation research questions .................................................... 35
Table C.1. Cluster and youth sample sizes by intervention status........................................................ 37
Table D.1. Methods used to address implementation research questions ............................................ 39
Table E.2. Estimated effects using data from short-term follow-up to address the primary research
           question .............................................................................................................................. 43
Table E.3. Estimated effects using data from short-term and long-term follow-up to address secondary
           research questions............................................................................................................... 43
Table H1. Baseline covariates used in logit models of response probability ....................................... 50
Table J.1. Prevalence of missing data for baseline covariates.............................................................. 52
Table K.1 Percentage of participants who self-reported receiving sexual health information in the last
          12 months, by treatment status……………………………………………………….…..54


Abt Associates                                                                                                             June 26, 2015 ▌iii
                                                                                                ABSTRACT

Abstract

Grantee                    Hennepin County Human Services and Public Health Department
                           Project Director: Katherine Meerse, Katherine.Meerse@hennepin.us

Evaluator                  Abt Associates
                           Evaluation Lead: Kim Francis, Kimberly_Francis@abtassoc.com
                                                   ®       ®
Intervention Name          Teen Outreach Program ( TOP )
                               ®
Intervention Description   TOP is a youth development and service learning program for youth ages 12 to 17
                           designed to reduce teenage pregnancy and increase school success by helping
                           youth develop a positive self-image, life management skills, and realistic goals. The
                                ®
                           TOP program model consists of three components implemented in school, after
                           school, or in community settings over nine consecutive months: (1) weekly
                           curriculum sessions, (2) community service learning (CSL), and (3) positive adult
                                                            ®                    ®
                           guidance and support. The TOP Changing Scenes curriculum is separated into
                           four age- and stage-appropriate levels, which range from Level 1, typically for youth
                           ages 12 or 13, to Level 4, typically for youth age 17.
                           The curriculum focuses on the presence of a consistent, caring adult; a supportive
                           peer group; skill development; sexual health; and sexual behavior choices. The
                           intended program dosage for each participant is a minimum of 25 weekly sessions
                           (one per week at 40–50 minutes each) and at least 20 hours of CSL over a nine
                           month period. One or two facilitators implement TOP®, generally in groups of 10 to
                           25 participants, and select and order the lessons based on the needs and interests
                           of the group. Lessons can be repeated, not selected, take place over more than one
                           session, and more than one lesson can be implemented in a session. There is no
                           fidelity requirement to implement sexual health-related lessons.
                           For this evaluation, lessons from Levels 1–4 of the program were delivered to
                           seventh to tenth graders via a co-facilitation approach, using both the classroom
                           teacher and a staff member provided by a local community-based organization.
                           Across Levels 1 - 4, facilitator pairs had 140 lessons from which to choose.
                           Consistent with the program model, there was no standardization of lessons across
                           the implementation. All program facilitators, including classroom teachers, received
                                                                             ®
                           a 19-hour curriculum training by a certified TOP replication partner. The program
                           was implemented in different types of classes, such as social studies or health, to
                           groups smaller than 10 and larger than 25 participants.

Counterfactual             Business as usual.

Counterfactual             Study participants scheduled into control teachers’ classes received the “business
                                                                                                           ®
Description                as usual” counterfactual. That is, control teachers were not trained in the TOP
                           curriculum and taught their classes as they normally would. These classes varied
                           across schools and included core subjects, such as social studies, and noncore
                           subjects, such as study hall/advisory and health. Participating schools varied in
                           terms of the standard sexual health or pregnancy prevention resources they offered
                           students. Most had health classes with a sex-education component and/or guest
                           presenters speaking about sexual health topics throughout the school year. One
                           school had an on-site health clinic.


Abt Associates                                                                               June 26, 2015 ▌iv
                                                                                                 ABSTRACT

Grantee                     Hennepin County Human Services and Public Health Department
                            Project Director: Katherine Meerse, Katherine.Meerse@hennepin.us
                                                                ®
Primary Research            What is the average impact of TOP , relative to the control group, on engagement
Question(s)                 in recent sexual activity three months after the program ends for the treatment
                                   1
                            group?

Additional Outcomes         Engagement in unprotected sex, delayed initiation of sexual activity, school
                            performance (self-reported course failure and school suspension), school
                            engagement and attachment, educational expectations, self-efficacy (general), self-
                            efficacy (civic), and civic responsibility.

Sample                      The analytic sample used to answer the primary research question consisted of
                            1,223 youth from 24 middle and high schools in Hennepin County, Minnesota,
                            including alternative and public charter schools. Students were enrolled in either
                            school year 2011–2012 (Cohort 1) or 2012–2013 (Cohort 2). The target group was
                            students in grades seven through ten (generally 12–16 years old). Participation in
                            the study sample was contingent on the schools’ willingness to participate and the
                            availability of (1) a school-year-long class that met with the same student cohort
                            throughout the school year and (2) a class period of sufficient length to complete a
                                                ®                    ®
                            lesson from TOP ’s Changing Scenes curriculum each week. Eligibility criteria for
                            students were: (1) enrollment in a randomly assigned teacher’s class at the time of
                            the baseline survey, (2) parent/guardian written consent, (3) written participant
                            assent, (4) ability to move, unassisted, through the baseline survey in English or
                            Spanish, and (5) for Cohort 2, no prior participation in TOP®.
                                ®
Setting                     TOP was delivered in middle schools, high schools, alternative schools, and public
                            charter schools in Hennepin County. It was implemented during school hours in
                            classes that span an entire school year with the same cohort of students. The
                                                                ®
                            subject of the class in which TOP was placed differed across schools (for example,
                                                                                            ®
                            social studies, study hall, health), but within each school, TOP was offered in only
                                               2
                            one class subject.

Research Design             This is a cluster randomized controlled trial. Teachers were randomized within
                            schools to the treatment and control conditions before the school year started to
                            enable the treatment teachers to complete the curriculum training. Notification of
                            random assignment occurred after students were scheduled into the study teachers’
                            classes and the consent and baseline survey processes were complete. Students
                            were scheduled into classes according to regular school procedures without
                            parents, students, or scheduling staff knowing the teachers’ study group status.
                            All eligible students were required to obtain active written parent/guardian consent
                            to participate in the study. The same consent process was used across treatment
                            and control teachers’ classes, including the same “blinded” parent/guardian consent
                            form. By providing written consent, the parents acknowledged that their children
                                                                    ®
                            might or might not be offered the TOP program. In all cases, scheduling staff,
                            students, and parents were unaware of the teachers’ study group status until after
                                                                                ®
                            the baseline surveys were completed. Since TOP is part of the regular school
                            curriculum, schools do not require parent permission for students to participate in
                                 ®
                            TOP programming, and there is no way for parents to opt their children out of any
                            class, other than via state law.
                                                                 ®
                            To assess the impact of offering TOP , students were surveyed three times: at


1
    There is no equivalent of “program end” for the control group or for treatment group members who leave the
    program. Follow-up surveys were administered to both groups 12 and 24 months after enrollment in the
    study.
2                           ®
    One school offered TOP in two class subjects, with each subject offered at a different grade level.


Abt Associates                                                                                June 26, 2015 ▌v
                                                                                              ABSTRACT

Grantee                   Hennepin County Human Services and Public Health Department
                          Project Director: Katherine Meerse, Katherine.Meerse@hennepin.us
                          baseline, before the intervention began for the treatment group; three months post-
                          programming (short-term impacts); and 15 months post-programming (long-term
                          impacts). Baseline data and subsequent follow-up data were collected using a Web-
                          based survey. Paper surveys were used as back-up for baseline data collection.
                          The pooled survey data from both cohorts (school years 2011–2012 and 2012–
                          2013) were used to estimate program impacts using an intent-to-treat (ITT) analysis.
                          Program fidelity and interview data were used to describe program implementation.
                                                           ®
Impact Findings           There was no evidence that TOP impacted the primary outcome, engaging in
                          recent sexual activity at the short-term follow-up. No impacts were detected for any
                          of the additional outcomes.

Implementation Findings   Program staff offered a median of 29 weekly sessions. Treatment group members
                          attended a median of 27 weekly sessions and completed a median of 18 CSL
                          hours. However, just 39 percent completed the minimum 20 hours of CSL, and 35
                          percent completed both 25 weekly sessions and 20 hours of CSL. The majority of
                          students responding to the short-term follow-up survey reported high-quality staff
                          interactions and engagement with the program. Over half of the control group
                          reported receiving information about several sexual health topics at school, and 41
                          percent had participated in community service in the prior 12 months. Eight schools
                          with control group members provided a school-wide community service or service
                          learning opportunity unrelated to TOP. There were no external events affecting
                          implementation; one unplanned adaptation was granted to shorten the duration of
                          the program from nine months to eight months where necessary to accommodate
                          parent consent and baseline survey processes.

Schedule/Timeline         Sample enrollment ended October 2012. The three-month post-program follow-up
                          data collection ended November 2013, and the 15-month post-program follow-up
                          data collection ended November 2014.


Abt Associates                                                                             June 26, 2015 ▌vi
                                                                              INTRODUCTION - DRAFT

1. Introduction
        A major priority for the U.S. Department of Health and Human Services (HHS) is finding

ways to reduce teenage pregnancy. A key strategy for achieving this goal is the Teen Pregnancy

Prevention Program, which invests in replicating existing evidence-based programs and identifying

new ones for populations at highest risk for teen pregnancy. The County of Hennepin, Minnesota,

was one of 16 grantees to receive funding from the Office of Adolescent Health (OAH) in 2010 to

replicate with fidelity and rigorously evaluate evidence-based teen pregnancy prevention programs. 3

The county focused its strategy on the eight cities with the highest teen birth rates and selected the

Teen Outreach Program® (TOP®) for replication in response to the community-identified need for

affordable, appealing healthy-youth development opportunities. At the time, Hennepin County’s 2008

teen birth rate of 29.1 per 1,000 females age 15 to 19 was lower than the national rate of 41.5 but

higher than Minnesota’s rate of 27.2 (Minnesota Organization on Adolescent Pregnancy, Parenting,

and Prevention 2010). More significantly, Hennepin’s overall rate masked critical disparities within

the county and between racial and ethnic groups. Rates in six of the implementation cities exceeded

the national rate and two had rates that were more than 50 percent higher than the national rate.

        This report describes the methods and results of the evaluation of the TOP® program as

implemented in Hennepin County. The evaluation included two studies: (1) the impact of offering

TOP® to middle- and high school-aged youth (the impact study) and (2) the context, implementation

fidelity, and challenges faced in implementing the program (the implementation study). Prior to this

evaluation, the primary evidence of TOP®’s effectiveness was based on one randomized controlled

trial conducted between 1991 and 1995 with 695 teens in 25 high schools across the United States

(Allen, Philliber, Herrling &amp; Kuperminc, 1997). The program took place in a mix of in-school and

after-school settings, and the youth sample was predominantly female (85%) and African American


3
    Grantees selected program models from the HHS Teen Pregnancy Prevention Evidence Review, a list that includes
    abstinence education programs, comprehensive sex education programs, HIV/AIDs prevention programs, programs for
    expectant and parenting teens, and youth development programs.


Abt Associates                                                                                  June 26, 2015 ▌1
                                                                                  INTRODUCTION - DRAFT

(67%), with an average age of 15.8 years. The subgroup of adolescent girls participating in the

program was significantly less likely than the control group to report a pregnancy during the

academic year of the nine-month program; no effects were found for boys (contributing to a

pregnancy). The study was not designed to analyze whether this effect was sustained beyond the

immediate post-test nor did it include sexual risk-taking behavior outcomes. The study met the HHS

Teen Pregnancy Prevention Evidence Review criteria for a high study rating, indicating that it was a

well-implemented randomized controlled trial based on the evidence review standards in place in

2010 (Mathematica Policy Research &amp; Child Trends, 2010; Goesling, Colman, Trenholm, Terzian &amp;

Moore, 2014).


1.1       Research Questions
          The current evaluation tested the extent to which TOP®, when replicated with fidelity,

produced impacts on sexual risk-taking behaviors in the short term and the longer term. The research

questions were pre-specified and categorized as primary (to establish the effectiveness of the

program) and secondary (additional questions about sexual risk behaviors to provide evidence

suggestive of program impacts). The primary research question was:

          What is the average impact of TOP®, relative to the control group, on engaging in recent

sexual activity three months after programming ends for the treatment group? 4

This research question measures the effect of offering TOP® both on delaying sexual intercourse (for

those who were not sexually active at baseline) and becoming abstinent (for those who were sexually

active either at baseline or during the follow-up period). The analysis of this question will provide

confirmatory evidence about TOP®’s impact on sexual behavior for Hennepin County’s replication.

          Five secondary research questions measure the impact of TOP® in a longer-term follow-up

period, with subgroups, and on an additional sexual behavior outcome:


4
      There is no equivalent of “program end” for the control group or treatment group members who leave the program.
      Follow-up surveys were administered to both groups 12 months (short-term follow-up) and 24 months (longer-term
      follow-up) after enrollment in the study.


Abt Associates                                                                                       June 26, 2015 ▌2
                                                                    INTRODUCTION - DRAFT

   (1) What is the average impact of TOP® relative to the control group on engaging in recent

       sexual activity 15 months post-program?

   (2) What is the average impact of TOP® relative to the control group on engaging in unprotected

       sex three and 15 months post-program?

   (3) Among those sexually inactive at baseline, what is the average impact of TOP® on delayed

       initiation of sexual activity three and 15 months post-program?

   (4) Do the average impacts of TOP® on engaging in recent sexual activity differ for male and

       female adolescents three and 15 months post-program?

   (5) Do the average impacts of TOP® on engaging in recent unprotected sex differ for male and

       female adolescents three and 15 months post-program?


Abt Associates                                                                   June 26, 2015 ▌3
                                  INTERVENTION AND COMPARISON PROGRAMMING

2.      Intervention and Comparison Programming
        TOP® is a youth development and service learning program designed to reduce teenage

pregnancy and increase school success by helping youth develop a positive self-image, life

management skills, and realistic goals. The TOP® program model consists of three components

implemented over nine consecutive months by trained adult facilitators: (1) weekly classroom

sessions, (2) community service learning (CSL), and (3) positive adult guidance and support. The

intended program dosage for each participant is a minimum of 25 weekly sessions (40–50 minutes

each) once per week, and at least 20 hours of CSL over the nine months.


2.1     Intended Program Content
        The TOP® model is characterized by its flexibility, which enables facilitators to best meet the

developmental needs of the youth from week to week. At least 80 percent of the weekly classroom

sessions are intended for lessons from the Changing Scenes® curriculum or for CSL activities. The

curriculum lessons span such topics as healthy relationships, boundaries, goal setting, planning,

communication, adolescent development, and conflict management. Program facilitators are free to

choose from 140 lessons (and multiple activities within each lesson) across four levels, and

implement them in an order that , meets the needs of the participants. Lessons may be repeated more

than once, implemented over more than one session, and multiple lessons may be implemented in one

session. There are no lessons that are required in order to meet fidelity requirements. Lessons on birth

control and other sexual health topics comprise a small proportion of the available lessons and are

also not required by the program developer for fidelity. Consistent with this approach, the choice of

whether or not to implement sexual health lessons was left up to each individual pair of facilitators

(the CBO staff and classroom teacher). The curriculum lessons are aimed at improving youths’ social-

emotional and self-regulation knowledge and skills, future-orientation, problem solving skills, and the

level of school attachment and engagement.


Abt Associates                                                                         June 26, 2015 ▌4
                                  INTERVENTION AND COMPARISON PROGRAMMING

         CSL activities begin with the student participants determining the needs of their defined

communities (e.g., school, neighborhood) and deciding on a group service project. The students may

choose to pursue individual service projects instead of or in addition to a group project, and they may

have more than one project over the course of the school year. The students plan and implement the

project(s), and program facilitators provide guidance and support, as well as opportunities for

reflection, linking the service experience to the Changing Scenes® curriculum content. Through the

CSL experience, youth are expected to increase their knowledge and skills in the areas of community

engagement and service learning, improve their ability to plan and set goals, and increase their sense

of empathy.

         Though no dosage requirement is associated with the third program component, positive adult

guidance and support, program facilitators are expected to (1) structure the nine-month experience to

meet the needs of the youth they are serving; (2) develop a pro-social group environment with

emotionally and physically safe norms and expectations; (3) demonstrate caring for each youth; and

(4) maintain a values-neutral position while facilitating discussions.

         The TOP® theory of change proposes that if these three components are executed with

fidelity and youth experience the immediate changes outlined above, they will have fewer incidences

of pregnancy or fathering a child, as well as improved self-efficacy, school performance, and attitudes

and skills toward service. 5 The primary and secondary outcomes in this evaluation focus on proximal

sexual behaviors that ultimately lead to pregnancy or fathering a child.


2.2      Intended Program Delivery and Setting
         Hennepin County is the thirty-third largest county in the United States by population, and

almost a quarter of the population of Minnesota resides in its 45 cities (Hennepin County Public


5
      Summarized from the Wyman Center’s Teen Outreach Program® Logic Model
      http://teenoutreachprogram.com/wp-content/uploads/2014/12/TOP-Logic-Model-FORMATTED-3-17-
      15.pdf


Abt Associates                                                                        June 26, 2015 ▌5
                                  INTERVENTION AND COMPARISON PROGRAMMING

Affairs, 2013). The county partnered with three community-based organizations (CBOs) with

experience providing sexual health programming to youth. The CBOs were responsible for:

  •   hiring and supervising staff to be frontline TOP® facilitators;
  •   recruiting schools, completing memorandums of understanding with each, and collaborating
      with classroom teachers to co-facilitate TOP®;
  •   collaborating with Hennepin County to ensure that the intervention was delivered with fidelity
      to the standards outlined by the program developer and OAH; and
  •   participating in ongoing training and technical assistance provided by Hennepin County.

        The county planned to deliver TOP® in middle and high schools during school hours in

classes that span an entire school year with the same cohort of students. Staff intended for CSL to

take place during school hours and/or out of school hours, on the school campuses or off, depending

on the nature of the projects chosen by youth and the logistical limitations of each school. The target

age group was students in grades seven through ten (generally 12–16 years old). Students could

participate in TOP® if their teacher was randomly assigned to incorporate it into their regularly

scheduled class once per week. TOP® was part of the regular school curriculum in the selected

subjects so parent permission was not required for students to participate. No opt-out option was

offered other than the state law that allows parents to opt their children out of any class. TOP® was

intended to be delivered by two co-facilitating adults, the classroom teacher and a staff member

employed by one of the three CBOs, regardless of class size. All program facilitators, including

classroom teachers, were required to participate in a 19-hour curriculum training led by a certified

TOP® replication partner. The CBO staff members also were to receive quarterly professional

development training and ongoing technical assistance from Hennepin County.

        None of the core components of the program had any planned adaptations. The co-facilitation

approach can be considered a modification in that the program model does not require two adults to

facilitate unless the student to trained-staff ratio is greater than 25:1. Hennepin County chose this co-

facilitation approach (where CBO frontline staff were paired with classroom teachers) as a strategy to


Abt Associates                                                                          June 26, 2015 ▌6
                                  INTERVENTION AND COMPARISON PROGRAMMING

institutionalize support for TOP® in the schools over time and promote the sustainability of the

program.


2.3     The Counterfactual Condition
        The difference between the intervention and the counterfactual condition (what was available

to the control group) has to be large enough to detect the effect of TOP® above and beyond what

students are offered already. Study participants scheduled into control teachers’ classes were meant to

receive the “business as usual” counterfactual. That is, control teachers were not trained in the TOP®

curriculum and taught their classes as they normally would in the absence of TOP®. The control

teachers’ classes varied across schools (they were the same class subject into which TOP® was placed

in each school) and included core classes, such as social studies, and noncore classes such as study

hall, life skills, and health. Most schools were assumed to offer some sexual health or pregnancy

prevention resources to all students. For example, some were known to offer health classes with a sex

education component, or to invite guest presenters to speak about sexual health topics; one school had

a health clinic on site.


Abt Associates                                                                        June 26, 2015 ▌7
                                                                                         STUDY DESIGN

3.        Study Design
          A cluster-randomized design was used to estimate the impact of TOP® on reducing sexual

risk-taking behaviors among urban teens in Hennepin County. Random assignment, when

implemented well, ensures that there are no systematic differences between treatment and control

groups on both observed and unobserved characteristics before the intervention begins. Any

differences in outcomes between the two groups can thus be causally attributed to the intervention

alone. A mixed-method implementation study described program implementation and provided

context for the impact findings. The following section describes in more detail sample recruitment

and randomization, data collection methods, outcomes for the impact analyses, baseline equivalence

of the study groups, and the analytic approach for both the impact and implementation studies.


3.1       Sample Recruitment
          Teachers and their students were recruited for the study from schools across Hennepin

County over two school-year cohorts (2011–2012 and 2012–2013). Classroom teachers were to be

trained to co-facilitate TOP® and considered part of the intervention, so teachers were the unit of

random assignment and the focus of recruitment efforts each year. To arrive at the final pooled

sample of teachers eligible for random assignment, recruitment began with schools that served

students in middle and high school grades from the eight cities with the highest teen birth rates in

Hennepin County (Brooklyn Center, Brooklyn Park, Minneapolis, New Hope, Crystal, Robbinsdale,

Hopkins, and Richfield). The recruitment pool consisted of public charter schools as well as school

districts and their affiliated Area Learning Centers (ALCs). 6 Hennepin County prioritized two types

of schools for recruitment:


6
      An Area Learning Center (ALC), sometimes referred to as an Alternative Learning Center, provides
      comprehensive educational services to students who are off-track for graduation and are working towards
      completing their graduation requirements. ALCs serve enrolled secondary students primarily but can serve
      students in middle grades as well.


Abt Associates                                                                               June 26, 2015 ▌8
                                                                                                STUDY DESIGN

        1) Larger schools with many classes and relatively large class sizes to help meet the needs of the

            study and program participation goals

        2) Schools with existing relationships with community-based organizations providing TOP®7

            Table A.1 in Appendix A summarizes the outcome of the school recruitment process for

    Cohorts 1 and 2 combined. Overall, the target area included 111 schools. Thirty of these schools

    expressed interest in implementing the TOP® program for the 2011–2012 or 2012–2013 school year.

    Once a school’s administration expressed interest in including the program as part of its regular

    school curriculum, the school contact worked with Hennepin County or the CBO partner to identify a

    class subject targeting students primarily in grades seven through ten that could incorporate the TOP®

    program once per week.

            The eligibility criteria for random assignment were set prior to randomization: TOP® classes

    needed to span the school year with the same student cohort and also be of sufficient length to

    complete a lesson from the TOP® Changing Scenes curriculum each week. Teachers of the identified

    class subjects must not have been previously trained in the TOP® curriculum (because they self-

    selected into the intervention), and the majority of the students in a class must be able to complete the

    baseline survey in English or Spanish unassisted. Across the two cohorts, the 30 schools that

    expressed interest in implementing the TOP® program identified 76 teachers. Of the 76 teachers, 13

    did not meet the eligibility criteria. This resulted in a pooled sample of 63 teachers from 25 schools

    eligible for random assignment. 8

            At the student level, all students enrolled in a study teacher’s class at the time of the baseline

    survey were eligible to participate in the study if they had: 1) active written parental consent; 2)


7
        Four school districts were prioritized due to their large size, one because of prior relationships; nine schools
        that participated in Cohort 1 were prioritized for recruitment in Cohort 2.
    8
        The first cohort consisted of 23 teachers from 11 schools. The second cohort consisted of 40 additional
        teachers from 22 schools (8 continuing schools and 14 new schools).


    Abt Associates                                                                                  June 26, 2015 ▌9
                                                                                           STUDY DESIGN

written personal assent; 3) the ability to move through the survey in English or Spanish unassisted;

and 4) for Cohort 2, no prior participation in TOP®.


3.2       Random Assignment
          The teachers scheduled by the school to teach the identified classes were randomly assigned

to either co-facilitate TOP® in that class once per week (treatment) or to implement the curriculum

that would have been used in the absence of TOP® (business as usual control group). Evaluation staff

randomly assigned the 63 eligible teachers to the treatment (36) or control (27) groups within schools

using the random number generator in the SAS statistical software package. 9 Within each

participating school, half of the eligible teachers were randomized to the treatment group. In schools

with an odd number of eligible teachers, we assigned the greater proportion of teachers to the

treatment group. The probability of assignment to the treatment group ranged from .50 to .66.

Because the program implementation approach required classroom teachers to be trained in the TOP®

curriculum, the teachers were randomly assigned during the summer months so that teachers assigned

to the treatment condition could complete TOP® curriculum training and incorporate the intervention

into their lesson plans before the start of the school year.

          Study procedures were designed to minimize the possibility of selection bias in how students

were assigned to teachers. The same parental consent process was used across all study teachers’

classes, including the timing, script, staff, and forms. The form asked parents for their permission to

allow their child to participate in the study, and clearly stated that by providing written consent their

child might or might not be offered the TOP® program. Students whose parents did not give

permission for the study were ultimately offered the program if they were scheduled into a treatment

teacher’s class.


9
      Teachers from schools with only one eligible teacher were pooled and randomly assigned. For schools with
      multiple teachers and two grade levels, evaluation staff randomly assigned teachers within each grade level.


Abt Associates                                                                                June 26, 2015 ▌10
                                                                                             STUDY DESIGN

          The point of notification about teacher random assignment occurred after the consent process

and baseline surveys were complete in a school– school scheduling staff, students, and parents were

unaware of the teachers’ study group status until that time. Within a school, the CBO program staff

person, study teacher(s), and relevant school administrator(s) were instructed (both in person and via

written communication) not to communicate to students, parents, or scheduling staff before the

completion of the baseline data collection about which teachers would be providing TOP®. Therefore,

neither the assignment of students to teachers nor parental consent should have been influenced by

whether or not teachers were assigned to offer the TOP® program. 10

         Students were scheduled into the identified classes according to regular school procedures at

the start of the school year before random assignment status was known. Students were scheduled

into classes systematically (e.g., the school computer system assigned all of the students in a

particular grade into the social studies classes using a pre-specified algorithm, or every other student

on an alphabetical roster was assigned to one of two life skills teachers). 11

         TOP® implementation began upon completion of baseline data collection in a given school.

For most schools, the first TOP® class took place during the first two weeks of October each year.

The TOP® sessions ended in June 2012 (for Cohort 1) and June 2013 (for Cohort 2), approximately

nine months later.


10
     Nine teachers (six treatment, three control) from Cohort 1 remained eligible in Cohort 2 and retained their
     random assignment status. Cohort 2 students were enrolled into these teachers’ classes according to
     standard school procedures without regard to the teachers’ study group status. Self-selection into these
     teachers’ classes for Cohort 2 is unlikely due to the following factors: (1) the three control teachers taught
     in schools that did not have treatment teachers; (2) for five treatment teachers, there was no other teacher in
     the school to select for that subject and grade level (e.g., a small charter school with one health teacher).
     Further, two of these five were ninth grade teachers whose students were not enrolled in the school the year
     before because the school starts with grade nine; and (3) for one treatment teacher, the alternative teacher
     for that subject in the school was also a treatment teacher.
11
     Information about student scheduling procedures is from self-reported information collected from schools
     by the grantee.


Abt Associates                                                                                 June 26, 2015 ▌11
                                                                                           STUDY DESIGN

3.3       Data Collection
          Impact evaluation data were collected via student surveys at three points: baseline, short-

term, and a longer-term follow-up. The short-term follow-up point was at the beginning of the

subsequent school year – that is, when study participants were no longer in their original groups. Data

on program fidelity, experiences of the control group, and factors that may have affected

implementation were collected on an ongoing basis throughout the study period to document program

implementation and provide context for the impact findings.

3.3.1     Impact Evaluation

          Evaluation staff collected all three survey waves in the same manner across treatment and

control groups using a Web-based survey and a combination of group administration and online self-

administration. Paper surveys were used as back-up for baseline data collection when access to the

Web-survey was unavailable. 12 To maximize response rates and engagement in the study over time,

survey respondents received a gift card incentive for each completed survey and were contacted three

times between each survey wave to update their contact information. Table A.2 in Appendix A

provides an overview of the data collection schedule. Table A.3 in Appendix A summarizes the data

collection procedures, including the mode, incentives, and staff involved at each data collection point.

3.3.2     Implementation Study
          Fidelity to the program model was assessed through measures of adherence and quality. To

assess adherence, evaluators collected the following information from program records: the number

of weekly sessions offered, the duration of the TOP® intervention cycle across classes, attendance at

weekly sessions, CSL hours completed by students in the treatment group, the facilitator to student

ratio, and the extent to which a consistent facilitator was maintained for each class. Quality of

implementation was assessed on two dimensions: student perceptions of staff-participant interactions


12
      Overall, 31 percent of the analytic sample used to answer the primary research question took the paper
      version of the baseline survey (26 percent of the control group and 34 percent of the treatment group).


Abt Associates                                                                                June 26, 2015 ▌12
                                                                                           STUDY DESIGN

and student engagement with the program. The quality of implementation data were collected from

treatment group members’ responses to eight items on the short-term survey that asked them to rate

their experiences with the program. 13

           The study team collected data on the counterfactual condition from questions on the short-

term student survey about receipt of information about sexual health and community service

participation during the first follow-up period. Finally, periodic interviews with program staff

provided information about the overall context of the implementation, such as other teen pregnancy

prevention programming available in the study schools, external events affecting implementation, any

unplanned adaptations to the program model, and implementation challenges. Table B.1 in Appendix

B summarizes the data sources used to assess the core implementation elements, including the

frequency of data collection and the staff responsible for collection.


3.4        Outcomes for Impact Analyses
           The primary research question is answered with a single-item dichotomous measure from the

short-term follow-up survey: “In the past three months, have you had sexual intercourse, even

once?” This measure of recent sexual activity captures the effect of offering TOP® on the full sample

of youth, whether they were sexually active at baseline or not. That is, it includes both delayed sexual

initiation (for those who were sexually inexperienced at baseline) and the decision to not have sex

(for those who were sexually experienced at baseline or became sexually active during the follow-up

period).

           The secondary research questions are answered using the same outcome from the long-term

follow-up survey, as well as two additional single-item dichotomous measures from both the short-

and long-term surveys, as shown in Table 3.1. All dichotomous measures are constructed as dummy


13
      Observations of TOP® sessions, which were a requirement of the grant, were conducted by a training and
      technical assistance organization certified by the program developer to provide curriculum training and not
      included as part of the implementation study.


Abt Associates                                                                                June 26, 2015 ▌13
                                                                                          STUDY DESIGN

variables where youth who respond “yes” to the question are coded as 1 and those who respond “no”

are coded as 0.

Table 3.1. Behavioral outcomes used for primary and secondary research questions

       Outcome name                           Description of outcome                   Timing of measure
                                                                                       relative to program

Primary outcome


Recent sexual activity         “In the past three months, have you had sexual       3 months post-program
                               intercourse, even once?”

Secondary outcomes


Recent sexual activity          “In the past three months, have you had sexual      15 months post-program
                                intercourse, even once?”

Recent unprotected sex          “In the past three months, have you had sexual      3 and 15 months post-
                                intercourse without you or your partner using any   program
                                [effective] type of birth control?”
Ever had sex
                                “Have you ever had sexual intercourse?” (for        3 and 15 months post
                                subgroup of sexually inexperienced at baseline)     program

Notes: Youth who had never had sex were coded as 0 (“no”) on all outcomes. Effective types of birth control
included condoms, birth control pills, the shot (Depo Provera), the patch, the ring (NuvaRing), and the IUD
(Mirena or Paragard).


3.5       Creation of the Analytic Sample
          Table C.1 in Appendix C depicts the flow of sample members from the beginning of the

study through the follow-up surveys that were used to address the primary and secondary research

questions. As described in Section 3.2, 63 teachers from 25 schools were randomly assigned. All but

two of these teachers participated in the study, resulting in a total of 61 teachers from 24 schools. 14

          Eligibility criteria, including parental consent, was met by 71 percent (N=1,644) of the

students enrolled in the study teachers’ classes at the time of the baseline survey; these students were

the focus of subsequent data collection efforts. Out of these eligible sample members, 96 percent

(n=1,580) completed the baseline survey (treatment group n=972 and control group n=608). Parents


14
      One treatment teacher and one control teacher from the same school decided not to participate after random
       assignment but before baseline data were collected from their students.


Abt Associates                                                                              June 26, 2015 ▌14
                                                                                             STUDY DESIGN

and students were not informed of the random assignment status of the teachers until after completion

of the consent and baseline survey processes. 15

         Out of all sample members with parental consent, 74 percent (n=1,223) responded to the

primary outcome measure at the short-term follow-up (treatment n=763 and control n=460). 16 The

attrition rate at the short-term follow-up was 26 percent, with differential attrition of 2.0 percentage

points. 17 The final analytic sample size for the short-term follow-up was 1,223 students.

         For the longer-term follow-up, 73 percent (n=1,196) of students with parental consent

responded to the secondary outcome measures at the long-term follow-up (treatment n=751 and

control n=445). 18 The attrition rate at the long-term follow-up was thus 27 percent, with differential

attrition of 3.0 percentage points. 19 The final analytic sample size for the long-term follow-up was

1,196 students.

         In general, the students in the analytic sample were in early adolescence, racially and

ethnically diverse, and not engaging in sexual risk-taking behavior at baseline. A little over one-half

(55 percent) were female, with an average age of 13.7 years. Black (non-Hispanic) and white (non-

Hispanic) youth were represented in equal proportions (27 percent each), and 18 percent identified as

Hispanic. The majority attended a traditional public middle or high school (72 percent) and spoke


15
     Out of all students enrolled at the time of the baseline survey, including those for whom parent consent was
      not obtained and thus were not eligible to participate in the study, 68 percent (67 percent treatment and 70
      percent control) completed baseline surveys.
16
     Out of all students enrolled in the study teachers’ classes at the time of the baseline survey, including non-
      consented students, 52 percent of the treatment group and 53 percent of the control group responded to the
      primary outcome measure on the short-term follow-up survey.
17
     The overall attrition rate for the sample of consented and non-consented youth at first follow-up is 47
     percent, with differential attrition of 1.0 percentage point.
18
     Out of all students enrolled in the study teachers’ classes at the time of the baseline survey, including non-
     consented students, 51 percent of the treatment group and 52 percent of the control group responded to the
     secondary outcome measures on the long-term follow-up survey.
19
     The overall attrition rate for the sample of consented and non-consented students at longer-term follow-up
     is 49 percent, with differential attrition of 0.1 percentage points.


Abt Associates                                                                                 June 26, 2015 ▌15
                                                                                          STUDY DESIGN

English at home (90 percent). Eighty-three percent had never had sex at baseline; 88 percent had not

had sex recently, with “recently” defined as the three months before the baseline survey.


3.6       Baseline Equivalence
          We conducted baseline equivalence tests for the short-term and long-term analytic samples to

assess whether attrition affected the comparability of the treatment and control groups. 20 The

statistical models for assessing baseline equivalence have the same structural form as the models used

to estimate impacts. Specifically, we tested for treatment- control differences on the baseline value of

each outcome variable for the primary and secondary research questions, as well as for the following

demographic variables: age, sex, race/ethnicity, and sexual experience at baseline. We used a multi-

level model to account for the clustering of students with teachers and indicator (or “dummy”)

variables to account for the randomization of teachers within schools.

          Tables 3.2 and 3.3 summarize the key baseline measures for the analytic samples, which

consist of students who responded to the primary and secondary outcome measures on the short-term

and long-term follow-up surveys, respectively. There are no significant differences (p &lt; .05) between

the treatment and control groups on the key baseline characteristics for either analytic sample.

Table 3.2. Summary statistics of key baseline measures for students responding to the short-
            term follow-up survey

                                                 ®
                                             TOP                 Control

         Baseline measure              Adjusted mean or      Adjusted mean or     Adjusted group     p-value of
                                          proportion            proportion          difference       difference
                                           (standard             (standard
                                           deviation)            deviation)

Age (years)                             13.78 (.20)           13.72 (.22)              0.06           0.81

Sex (female)                                  0.551                0.552             -0.001           0.96

Race/ethnicity

      White                                   0.272                0.255              0.017           0.62


20
      The attrition rates met the Teen Pregnancy Prevention Evidence Review threshold for low attrition at both
      follow-up points.


Abt Associates                                                                                June 26, 2015 ▌16
                                                                                       STUDY DESIGN

                                                 ®
                                          TOP                 Control

        Baseline measure            Adjusted mean or      Adjusted mean or     Adjusted group     p-value of
                                       proportion            proportion          difference       difference
                                        (standard             (standard
                                        deviation)            deviation)

    Black                                  0.273                0.281             -0.008          0.84

    Hispanic                               0.161                0.198             -0.037          0.13

    Asian                                  0.134                0.114             0.020           0.34

   Other                                   0.162                0.154             0.008           0.78

Ever had sex                               0.173                0.165             0.008           0.86

Recently sexually active                   0.124                0.124             0.000           0.99

Recent unprotected sex                     0.031                0.043             -0.012          0.46

Sample size                                763                   460
Note: Analytic sample size reflects those with non-missing values on the primary outcome measure.

Table 3.3 Summary statistics of key baseline measures for students responding to the long-
            term follow-up survey

                                             ®
                                         TOP                Control

        Baseline measure            Adjusted mean       Adjusted mean or     Adjusted group        p-value of
                                     or proportion         proportion          difference          difference
                                      (standard             (standard
                                      deviation)            deviation)

Age (years)                         13.80 (.18)           13.70 (.21)              0.10                0.68

Sex (female)                             0.555                 0.560             -0.005                0.86

Race/ethnicity

    White                                0.278                 0.256              0.022                0.58

    Black                                0.287                 0.284              0.003                0.95

    Hispanic                             0.157                 0.210             -0.053                0.11

    Asian                                0.129                 0.116              0.013                0.54

   Other                                 0.155                 0.135               0.02                0.54

Ever had sex                             0.176                 0.148              0.028                0.47

Recently sexually active                 0.120                 0.121             -0.001                0.98

Recent unprotected sex                   0.033                 0.035             -0.002                0.91

Sample size                               751                 445
Note: Analytic sample size reflects those with non-missing values on the secondary outcome measures.


Abt Associates                                                                             June 26, 2015 ▌17
                                                                                             STUDY DESIGN

3.7       Analytic Approach
          To answer the primary research question, we used an intent-to-treat (ITT) framework and

data collected at the short-term follow-up to estimate the average impact of TOP®, relative to the

control group, on participants’ sexual activity. An ITT analysis estimates the impact of the program

on all eligible students who were enrolled in a treatment teacher’s TOP® class regardless of the level

of program participation. 21

          The impact estimate is the regression-adjusted difference between the average outcomes of

students in treatment teachers’ classes and students in control teachers’ classes. 22 Impact estimates

with p-values less than 0.05 (two-tailed test) are considered statistically significant and provide

evidence that there are likely true differences between the groups as a result of TOP®.

          The analytic approach used regression modeling to adjust for two aspects of the design. First,

because teachers were randomly assigned, a multilevel model accounted for the clustering of students

with teachers. 23 Second, the impact models included dummy variables to account for teachers being

randomized within schools or within a group of schools. In addition, student-level baseline

characteristics (sex, age, race/ethnicity, school-year cohort, and the baseline value of the outcome)

were included as covariates in the impact models to increase the statistical precision and power of the

impact estimates. For the detailed model specification, see Appendix G.

          Missing data occurred at both baseline and follow-up data collection points. To account for

missing baseline covariates, we applied the dummy variable method (Puma, Olsen, Bell, &amp; Price,

2009). For missing outcome data, non-response weights were applied to give more weight to


21
      Most treatment group members participated in at least some of the program. Approximately one percent
      received no programming, due to being transferred out of the class after the day of the baseline survey but
      before the first program session.
22
      Impacts on dichotomous outcomes were estimated with a linear probability model for ease of interpretation.
      Appendix E presents the results of sensitivity analyses using a two-level logistic regression model.
23
      Adjustments for clustering account for the statistical non-independence within groups of students enrolled
      in each teacher’s class. If no adjustment for clustering is made, the standard error of the estimated impact
      will be incorrect and statistical significance of impact estimates may be overstated.


Abt Associates                                                                                 June 26, 2015 ▌18
                                                                                      STUDY DESIGN

respondents who were underrepresented in the analytic sample compared to the full baseline

sample. 24 Missing outcomes were not imputed. The prevalence of missing baseline covariates is

described in Appendix J. For a description of how the non-response weights were constructed, see

Appendix H.

         The analytic approach for the secondary research questions mirrored the approach used for

the primary research question, except for one subgroup analysis where we tested whether TOP®

differentially impacted students depending on their sex. For this analysis, we created an interaction

term for treatment status conditioned on the subgroup indicator variable (e.g., 1 = female). The

estimated coefficient for the interaction term measures the differential impact of the treatment

between male and female adolescents.

3.7.1    Implementation Study

         Data collected to answer the implementation study research questions on adherence, quality,

the counterfactual, and context were analyzed using descriptive statistics to characterize the level of

implementation. To assess adherence to the program model, the key measures were:

         •   median number of weekly sessions offered and received;

         •   median number of CSL hours received;

         •   percentage of students completing 25 or more weekly sessions and 20 or more CSL

             hours; and

         •   average number of consecutive months TOP® sessions were held.


         To measure quality of staff-participant interactions, two composite variables were created.

The first measures the extent to which participants felt their TOP® teacher was caring and

understanding and is derived from the percentage of treatment group respondents whose average


24
     Weights were applied to the data using the weight statement in SAS PROC MIXED.


Abt Associates                                                                         June 26, 2015 ▌19
                                                                                  STUDY DESIGN

combined score on three survey items was 3 or more on a scale of 1-4 where 1 = “No, not at all” and

4 = “Yes, very much.” The items were: “My TOP® teacher cared about me,” “…understood me,” and

“…supported and accepted me.” The second variable measures the extent to which participants

agreed that their TOP® class was a safe, values-neutral environment. It was constructed in the same

manner and is based on two survey items: “When I was at TOP® I could say what I think and talk

about my life,” and “I felt physically safe during TOP® sessions.”

        The quality of student engagement with the program was measured by a composite variable

representing the extent to which participants agreed that TOP® was youth-driven and engaging.

Constructed in the same manner as the above two variables, the items were: “I felt like I belonged at

TOP®,” “I enjoyed the community service part of TOP®,” and “I helped plan my community service

project.” Due to survey non-response, quality measures may not be representative of all TOP

participants. For a complete description of each implementation data element and how it was

quantified, please see Appendix D.


Abt Associates                                                                      June 26, 2015 ▌20
                                                                                STUDY FINDINGS

4.      Study Findings
        The two goals of the evaluation were to (1) determine if TOP® had favorable impacts on

students’ level of sexual activity, and (2) understand how TOP® was implemented to provide context

for the impact findings. Section 4.1 presents the results of the implementation study, followed by

findings from the impact analyses to determine the overall effectiveness of the intervention.


4.1     Implementation Study Findings
        The implementation study focused on four areas: the extent to which the program adhered to

program fidelity standards and was delivered with quality, as well as the experiences of the control

group and any contextual circumstances that substantially affected implementation. The analysis

found that, in general, TOP® was delivered as intended in accordance with the model; however, many

students did not receive the minimum dosage of CSL, and the “business as usual” condition shared

some similarities with the treatment condition.

4.1.1   Adherence to Program Model


        Adherence includes measures of how much of the program was offered to participants, how

much was received by participants, and who delivered the material to participants. The intended

program dosage for each participant is a minimum of 25 weekly TOP® sessions (one per week at 40–

50 minutes each) and at least 20 hours of CSL over nine months. The dosage offered by program staff

in this instance was generally consistent with the TOP® model. Across TOP® classes, students were

offered a minimum of 25 weekly sessions with a median of 29 sessions. The median class period

length was 50 minutes, and the average duration of TOP® was 8.2 months.

        The dosage received by treatment group members did not consistently meet the expectations

of the program model. Treatment group members attended a median of 27 weekly sessions, with 67

percent meeting or exceeding the minimum dosage of 25 sessions. The median number of CSL hours

completed by the treatment group was 18, and with 39 percent completing the minimum 20 hours.


Abt Associates                                                                       June 26, 2015 ▌21
                                                                                  STUDY FINDINGS

The percentage of treatment group members who attended at least 25 sessions and completed a

minimum of 20 CSL hours was 35 percent. Weekly session attendance was associated with

completion of CSL hours; of those with at least 20 hours of CSL, 89 percent also had attended at least

25 weekly sessions (see Table 4.1).

Table 4.1. Crosstab of weekly session attendance by CSL hours completed

                               &lt; 25 weekly          25+ weekly                Total
                                sessions             sessions

 &lt; 20 hours CSL                 220 (47%)            244 (53%)                 464

 20+ hours CSL                   32 (11%)            267 (89%)                 299

 Total                              252                  511                   763
 2
X (2, N=763) = 110.79, p &lt;.01
Note: Percentages are row percentages.

         Through key informant interviews with program staff, the implementation study found that

CSL was particularly challenging to implement in accordance with the model’s fidelity criteria.

Common challenges included fitting in 20 hours of CSL and 25 weekly TOP® sessions when the time

allotted to the program was often limited to less than an hour a week during the school day. Program

staff also reported challenges helping students choose meaningful service projects that could be

accomplished without leaving the school in cases where off-site service work was not feasible, and

maintaining group continuity over the full school year when some students did not attend school

regularly or transferred out during the year.

         Lastly, the program model requires that all classes keep the consistent presence of at least one

trained facilitator throughout the full program year and maintain at least a 25:1 ratio of students to

trained facilitators. All TOP® classes in the treatment group met or exceeded these standards, with an

average student to staff ratio of 14:1.

4.1.2    Quality of Implementation

         Student participants perceived high-quality interactions with staff and high engagement with

the program. Specifically, 86 percent of the treatment group responding to the first follow-up survey


Abt Associates                                                                         June 26, 2015 ▌22
                                                                                         STUDY FINDINGS

agreed that their TOP® facilitator was caring and understanding; 85 percent agreed that their TOP®

class was a safe, values-neutral environment. Almost three-fourths (73 percent) agreed that TOP® was

engaging and youth-driven.

4.1.3    Experiences of the Control Group

         Survey findings from the control group students at the first follow-up suggest that TOP® was

implemented in service-rich settings. Over one-half reported receiving information within the past

year on relationships and dating (76 percent), reproduction (75 percent), abstinence (67 percent), how

to say no to sex (66 percent), STDs (65 percent), and birth control methods (53 percent). The most

common source of the information was a class, workshop, or event at school. The treatment group

tended to report higher rates of receiving sexual health information than the control group at first

follow-up (see Appendix K). 25

         More than 40 percent of the study participants reported community service participation

unrelated to TOP® (41 percent control and 43 percent treatment) during the prior 12 months. Of the

control group members who reported this, about one-half (48 percent) spent between one and nine

hours on these projects. Twenty-nine percent spent 20 or more hours. Treatment group members

reported very similar amounts of time spent on non-TOP® service projects(50 percent spent up to nine

hours, and 28 percent spent 20 or more hours).

4.1.4    Context


         The schools contributing sample members for the study did not have youth development

programs in place with the specific intensity and duration of TOP®. However, several schools

provided resources and opportunities to students that were similar in nature. Twelve schools offered

school-wide community service or service learning opportunities unrelated to TOP®, and 12 offered

at least one of the following four mechanisms for students to access sexual health information

25
     These self-reported rates increased across all topics for the control group at the second follow-up, while
     remaining steady or increasing slightly for the treatment group.


Abt Associates                                                                                June 26, 2015 ▌23
                                                                                      STUDY FINDINGS

(unrelated to TOP®): (1) presentations and other services by non-school staff, (2) sex education

curriculum, (3) puberty/anatomy information, or (4) sexual-health-related elective classes. Nine

schools offered both a school-wide community service/service learning opportunity and at least one

type of formal sexual health education. If treatment teachers taught classes where sexual health

information was already offered, TOP supplemented these activities.

        There were no external events that substantially affected implementation during the study

period. The grantee requested and was granted one unplanned adaptation to implement TOP® for

eight months instead of the full nine months. This was necessary in a subset of schools to

accommodate the parental consent process and baseline survey administration at the start of the

2011–2012 and 2012–2013 school years, before the first TOP® sessions for the treatment group.


4.2     Impact Study Findings
        Table 4.2 shows the estimated effect of TOP® on the primary outcome measure. There is no

evidence that TOP® caused changes in the likelihood of engaging in sexual activity. At the short-term

follow-up, 14 percent of treatment group members reported having had sex recently, compared to 15

percent of the control group. The estimated impact (1.0 percentage point) is not statistically

significant (p = 0.68) and indicates there is likely no true difference between the two groups.

Table 4.2. Estimated effect using data from short-term survey to address the primary research
           question

                                         ®
                                    TOP                  Control

 Outcome                      Adjusted mean          Adjusted mean                 Treatment effect
                                   or %                   or %                  (p-value of difference)

 Recently sexually                  0.143                  0.153                       -0.01 (0.68)
 active
Source: Follow-up surveys administered 3 months post-programming.
Notes: Recently sexually active is defined as “had sex in the past 3 months.” See Chapter 3 for a description of
the impact estimation methods.


Abt Associates                                                                              June 26, 2015 ▌24
                                                                                     STUDY FINDINGS

4.2.1   Secondary Research Questions


        Table 4.3 summarizes the findings for the secondary research questions. First, there is no

evidence that TOP® caused changes in the prevalence of recent unprotected sex at either follow-up

point. While the short-term findings indicate a 3.1 percentage point difference on this outcome

favoring the treatment group, this difference was not statistically significant, and the difference

shrank to less than one percentage point at the long-term follow-up. Second, consistent with the

finding for the primary research question, there is no statistically significant difference between the

percentage of treatment (16.8 percent) and control (19.1 percent) group members engaging in recent

sex at the long-term follow-up. Third, TOP® had no impact at either follow-up point on delaying

sexual activity among the subgroup of students who were sexually inexperienced at baseline. Given

that nearly three-quarters of the full sample was sexually inexperienced at baseline, this is consistent

with the finding for the primary research question.

        Finally, the average impacts of TOP® on recent sexual activity did not differ between male

and female participants in the short-term (p =.65) or long-term (p = .09). There also were no

differences in recent unprotected sex between male and female adolescents in the short-term (p =.52)

or long-term (.08). The average impacts for each subgroup are shown in Table 4.3 below.

Table 4.3. Estimated effects using data from the short and long-term surveys to address
           secondary research questions

                                        Short-term impacts                       Long-term impacts
                                    ®                                        ®
                              TOP            Control                     TOP         Control

    Outcome measure          Adjusted        Adjusted    Treatment      Adjusted     Adjusted    Treatment
                             mean or         mean or        effect      mean or      mean or        effect
                            proportion      proportion   (p-value of   proportion   proportion   (p-value of
                                                         difference)                             difference)

Recent unprotected sex        0.041           0.072           -.031      0.063        0.066          -0.003
                                                             (0.31)                                  (0.90)

Recently sexually active        -               -              -         0.168        0.191          -.023
                                                                                                     (0.48)

Ever had sex                   0.10           0.077      .024 (0.33)     0.20         0.147          .052
Subgroup: sexually                                                                                   (0.16)


Abt Associates                                                                           June 26, 2015 ▌25
                                                                                      STUDY FINDINGS

                                        Short-term impacts                        Long-term impacts
                                    ®                                         ®
                               TOP           Control                      TOP         Control

    Outcome measure           Adjusted       Adjusted    Treatment      Adjusted      Adjusted     Treatment
                              mean or        mean or        effect      mean or       mean or         effect
                             proportion     proportion   (p-value of   proportion    proportion    (p-value of
                                                         difference)                               difference)
inexperienced at baseline

Recently sexually active       0.165          0.174          -0.009       0.165         0.206         -.041
                                                             (0.85)                                   (0.39)
Subgroup: girls

Recently sexually active       0.153          0.167          -0.014       0.164         0.186         -0.022
                                                             (0.75)                                   (0.63)
Subgroup: boys

Recent unprotected sex         0.037          0.079          -0.042       0.078         0.063         0.015
                                                             (0.22)                                   (0.68)
Subgroup: girls

Recent unprotected sex         0.054          0.076          -0.022       0.054         0.081         -0.027
                                                             (0.51)                                   (0.56)
Subgroup: boys
Source: Follow-up surveys administered 3 and 15 months post-programming.
Notes: Recently sexually active is defined as “had sex in the past 3 months.” Unprotected sex is defined as sex
in the past 3 months without the use of effective birth control. Analyses were not adjusted for multiple
comparisons. See Chapter 3 for a description of the impact estimation methods.

        To ascertain if the results were sensitive to the analysis approach, we conducted additional

analyses using alternative approaches. These included (1) using multilevel logistic regression models

for dichotomous outcomes, (2) removing non-response weights, (3) setting to missing any

inconsistent responses across baseline and follow-up survey waves, and (4) removing individual-level

covariates. Across all alternative model specifications, findings were consistent with those found

using the benchmark approach (see Appendix E for a summary of the sensitivity analyses).


Abt Associates                                                                             June 26, 2015 ▌26
                                                                                      CONCLUSION

Conclusion
        This study is one of the first rigorous evaluations of TOP® since the original randomized

controlled trial found favorable impacts on teen pregnancy almost 20 years ago (Allen et al., 1997).

Since that time, the program has expanded nationwide and is one of the most widely replicated teen

pregnancy prevention programs: OAH funded 17 replications of TOP® in 2010, and the program

developer reported that TOP® was implemented in more than 350 communities in 31 states in 2012

(Wyman National Network, 2012). Based on data from a sample of approximately 1,200 students

from 24 middle and high schools in Hennepin County, Minnesota, there were no impacts on sexual

risk-taking behaviors at either short- or long-term follow-up points. Students in the treatment group

were no less likely than the control group to report engaging in recent sexual activity or recent

unprotected sex. Among the subgroup of students who were sexually inexperienced at baseline, those

who were offered TOP® were no more likely than the control group to delay sexual initiation. The

program was generally delivered as intended; however, many students did not receive the minimum

dosage of CSL, and the “business as usual” control condition may have shared some features of the

treatment condition.

        That the study was unable to find convincing evidence that TOP® reduced sexual risk-taking

behaviors is inconsistent with the findings from Allen et al. (1997). While the two studies employed

different study designs and occurred almost 20 years apart, it is noteworthy that positive results were

not replicated with a larger sample and on behavioral outcomes that are more prevalent in the

population than pregnancy. In the remainder of this section, we present potential explanations for the

divergent results, suggestions for further research that can address new questions generated by this

study, and the limitations of the study.

        First, the demographic characteristics and baseline risk level of the two samples were

markedly different. In the Allen et al. (1997) study, the sample was predominantly female, African

American, and almost 16 years old on average at baseline, whereas the current study was closer to 50


Abt Associates                                                                        June 26, 2015 ▌27
                                                                                     CONCLUSION

percent female and included a more racially and ethnically diverse group of teens closer to 14 years

old on average. Less than one-fifth of the current study sample had engaged in sexual activity at

baseline and just 3 percent had ever been pregnant, while in the Allen et al. (1997) study 6 percent of

the treatment group and 10 percent of the control group had been pregnant. TOP® is meant to be a

universal prevention program for the youth population, but this study was not able to detect any

effects on sexual risk-taking behavior among the sample of mostly young, low-risk youth at the

selected Hennepin County schools. Further research would be needed to test if the program is able to

impact sexual risk-taking behaviors among older youth who are also more likely to be sexually active

or thinking about becoming so (Allen &amp; Philliber, 2001).

        While the underlying theory of change is consistent across both implementations, the CSL

components may have been structured differently. The CSL component of the earlier implementation

appears to have included longer-term volunteer placements in community settings in collaboration

with local CBOs, and the intervention itself was offered in a mix of in-school and after-school

settings. Moreover, students in the earlier study averaged 45.8 hours of service, with the median

participant completing 35 hours of service (Allen et al., 1997, p. 731).

        When compared to Hennepin County’s median 18 hours of CSL, as well as the challenges to

spending time outside the school day for service projects experienced by some TOP® participants, a it

could be argued that a more intensive service learning experience might elicit an impact. However,

non-experimental research suggests that the number of CSL hours is of less importance than the

quality of CSL in predicting positive outcomes for TOP® (Allen, Kuperminc, Philliber, &amp; Herre,

1994), and the program developer states that off-site service work is not necessary for high-quality

CSL (Wyman National Network, 2014). Nonetheless, this program component in particular was

shaped by the circumstances of each school setting, some of which allowed off-site service projects

and some of which did not, and points to the need for further research on the conditions under which

high-quality, meaningful, youth-driven service experiences occur. These circumstances and


Abt Associates                                                                       June 26, 2015 ▌28
                                                                                        CONCLUSION

conditions that can affect the overall quality of CSL include the logistical constraints of the setting,

the developmental level of the students, and the consistency of student attendance over the nine

months of the program. The relative importance of the weekly session and CSL-hour doses also

requires further study in an experimental framework given the mixed findings of prior research in this

area (Allen, Philliber, &amp; Hoggson, 1990; Allen et al., 1994).

         Another consideration is that many of Hennepin County’s implementation settings were

service-rich environments; the effect the program might have in more disadvantaged settings is

unknown. Several of the study schools offered, as standard practice, opportunities for learning about

sexual health and for contributing to the schools and communities through service. Implementing

broad prevention programs in settings where other programs already exist is common. However, this

situation creates a tougher standard for the program under study to meet; the intervention must

produce impacts that are above and beyond what is already being generated in its absence. Future

research could test the impact of TOP® in lower-resource communities and schools where TOP® is

likely to fill a larger gap in services.

         Limitations of the study include external validity and potential contamination of the control

group members within schools. First, since the study schools were not a representative sample of all

schools in the targeted eight cities within Hennepin County, the results cannot be generalized beyond

the specific schools and youth that agreed to participate in the study. Second, because some schools

included both treatment and control group teachers, the control group students may have had some

exposure to the concepts taught by TOP® through associations with treatment group students. While

this type of contamination is not measurable in our study, the nature of such exposure is indirect and

excludes the core components of the program (i.e., weekly peer group sessions, CSL, and positive

adult guidance and support). This suggests that any control group contamination would have been

minor relative to the exposure to TOP® received by the treatment group.


Abt Associates                                                                          June 26, 2015 ▌29
                                                                                      CONCLUSION

    Finally, the absence of impacts found in this study should be interpreted in the context of six

other rigorous evaluations of TOP® funded simultaneously through OAH. The results of all seven

studies present a unique opportunity for policymakers, practitioners, and researchers alike to learn

about the program’s effectiveness across a series of studies in different settings and with different

populations.


Abt Associates                                                                         June 26, 2015 ▌30
                                                                                    REFERENCES

5.      References
Allen, J. P., Kuperminc, G. P., Philliber, S., &amp; Herre, K. (1994). Programmatic prevention of
    adolescent problem behaviors: the role of autonomy, relatedness, and volunteer service in the
    Teen Outreach Program. American Journal of Community Psychology, 22(5), 617–638.
Allen, J. P., &amp; Philliber, S. (2001). Who benefits most from a broadly targeted prevention program?
    Differential efficacy across populations in the Teen Outreach Program. Journal of Community
    Psychology, 29(6), 637–655.
Allen, J. P., Philliber, S., Herrling, S., &amp; Kuperminc, G. P. (1997). Preventing teen pregnancy and
    academic failure: Experimental evaluation of a developmentally based approach. Child
    Development, 68(4), 729-742.
Allen, J.P., Philliber, S., &amp; Hoggson, N. (1990). School-based prevention of teenage pregnancy and
    school dropout: Process evaluation of the national replication of the Teen Outreach Program.
    American Journal of Community Psychology, 8, 505-524.
Goesling, B., Colman, S., Trenholm, C., Terzian, M., &amp; Moore, K. (2014). Programs to reduce teen
   pregnancy, sexually transmitted infections, and associated sexual risk behaviors: A systematic
   review. Journal of Adolescent Health, 54(5), 499 – 507.
Hennepin County Public Affairs. (2013). Hennepin County Fact Sheet. Retrieved from
   http://www.hennepin.us/~/media/hennepinus/your-government/overview/ Documents/
   HC_FastFacts_fs_Sep_2013.pdf on March 1, 2015.
Mathematica Policy Research &amp; Child Trends. (2010). Identifying programs that impact teen
   pregnancy, sexually transmitted infections, and associated sexual risk behaviors. Review Protocol
   Version 1.0.
Minnesota Organization on Adolescent Pregnancy, Parenting, and Prevention. (2010). 2008 County
   Teen Pregnancy and Birth Data.

Puma, M. J., Olsen, R. B., Bell, S. H., &amp; Price, C. (2009). What to Do When Data Are Missing in
   Group Randomized Controlled Trials (NCEE 2009-0049). Washington, DC: National Center for
   Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department
   of Education.

Schochet, P. Z. (2010). Is regression adjustment supported by the Neyman model for causal
   inference? Journal of Statistical Planning and Inference,140, 246–259.
Wyman National Network (2014). Community Service Learning Resource Guide. November.


Abt Associates                                                                       June 26, 2015 ▌31
                    APPENDICES

6.     Appendices


Abt Associates      June 26, 2015 ▌32
                                                                                      APPENDICES

Appendix A: Data Collection Efforts


Table A.1. Outcome of teacher recruitment effort (Cohorts 1 and 2 pooled)

         Recruitment result             Number of unique schools            Number of unique teachers

Total number of schools serving                   111                                  NA
target population in eight cities

Unresponsive to recruitment efforts                44                                  NA

Declined participation                             37                                  NA

Successfully recruited, but teachers               5                                      13
ineligible for random assignment

Successfully recruited and teachers                25                                     63
eligible for random assignment
NA = not applicable


                                                                                      ®
Table A.2. Timing of data collection efforts used in the impact analysis of TOP

                                                                   Timing

Data collection effort                          Cohort 1                            Cohort 2

Baseline survey                              September 2011             September 2012-October 2012

Start date of programming                     October 2011                        October 2012

End date of programming                        June 2012                            June 2013

Short-term follow-up                   September 2012–January 2013      August 2013–November 2013

Long-term follow-up                    August 2013–November 2013        August 2014–November 2014


Abt Associates                                                                        June 26, 2015 ▌33
                                                                                                   APPENDICES
                                                                                                            ®
Table A.3. Summary of data collection procedures used in the impact analysis of TOP

                                                         Data Collection Points

                       Parent           Baseline       6 Month          Short-term       18-Month         Long-term
                       Consent                         Tracking         follow-up        Tracking         follow-up

Survey mode            Paper            Self-          Self-            Self-            Self-            Self-
                       signature        administered   administered     administered     administered     administered
                       form and         survey in      web survey,      web survey;      web survey       web survey;
                       parent           school/group   paper            subset in        or telephone     subset in
                       brochure         setting        contact          group                             group setting
                                                       form, or         setting
                                                       telephone

Survey reminder        NA               NA             Email, letter,   Email, letter,   Email, letter,   Email, letter,
mode                                                   text             text             text             text
                                                       message          message,         message          message,
                                                                        telephone                         telephone
Incentive              $5 Target gift   $15 Target     $5 Pizza Hut     $25 Target       $10 CVS          $30 Target
                       card for         gift card      eGiftCard for    eGiftCard        eGiftCard        eGiftCard
                       student                         Cohort 1;
                                                       $10 CVS
                                                       eGiftCard for
                                                       Cohort 2

Staff involved         Trained          Evaluation     Trained          Evaluation       Evaluation       Evaluation
                       program staff    staff          program          staff (for       staff            staff (for
                       and                             staff and        group                             group
                       evaluation                      evaluation       settings                          settings only)
                       staff                           staff            only)

Treatment/control      None             None           None             None             None             None
differences in
procedures
Note: A subsample of Cohort 2 non-respondents who were unreachable in schools for the long-term survey was
offered an increased incentive of $50. Twenty-nine Cohort 2 participants (18 treatment, 11 control) received this
increased incentive.
NA = not applicable.


Abt Associates                                                                                     June 26, 2015 ▌34
                                                                                                                                             APPENDICES


Appendix B: Implementation Study Data Sources
Table B.1 Data used to address implementation research questions

Implementation element            Types of data used to assess whether the                 Frequency/sampling of data              Party responsible for
                               element of the intervention was implemented as                      collection                         data collection
                                                   intended

Adherence
                                                                                     All sessions delivered                      CBO staff
How many sessions were         All sessions offered are captured in performance
offered? How often were        measure reporting system (PMRS)
sessions offered?

                               Length (number of minutes) of class periods kept in
                               program records                                       N/A                                         CBO staff


                               Duration (number of months) of program from           All session dates                           CBO staff
                               session dates in PMRS
                               Daily attendance records (includes # of CSL hours     Student attendance at all sessions is       CBO staff
What and how much of the
                               completed per participant)                            recorded in PMRS
program was received?
                                                                          ®
                               List of facilitators assigned to each TOP club        Data on all program staff is available to   Grantee staff
Who delivered material to
                               maintained in program records                         grantee staff
youth?
                               Ratio of trained staff to students kept in program
                               records
                                                                               ®
                               List of staff hired and trained to facilitate TOP
Quality
                               Follow-up survey questions answered by treatment      12 months after baseline; all treatment     Evaluation staff
Quality of staff-participant
                               group members on extent to which program was:         group members responding to survey
interactions                   -Delivered by caring &amp; understanding facilitator
                               -Delivered in safe environment
                               -Values neutral


Abt Associates                                                                                                                           March 27, 2015 ▌35
                                                                                                                                              APPENDICES


Implementation element               Types of data used to assess whether the                Frequency/sampling of data             Party responsible for
                                  element of the intervention was implemented as                     collection                        data collection
                                                      intended
                                  Follow-up survey questions answered by treatment      12 months after baseline; all treatment   Evaluation staff
Quality of youth engagement
                                  group members on extent to which program was:         group members responding to survey
with program                      -Youth driven
                                  -Engaging

Counterfactual
                                  Follow-up survey questions answered by control        12 and 24 months after baseline; all      Evaluation staff
Experiences of control
                                  group members on receipt of information about         control group members responding to
condition                         sexual health, relationships, and CSL participation   each survey
Context
                                  Interviews with subset of school staff                Once during study period to purposively   Evaluation staff
Other TPP programming
                                                                                        selected sample
available or offered to study     Template provided by evaluator and completed by
participants (both intervention   school-based CBO staff                                Once per year during study period, all    CBO staff
and comparison)                                                                         schools
                                  Interviews with grantee and program staff             Once per year for two years               Evaluation staff
External events affecting
implementation                    Weekly calls with grantee staff
                                                                                        Weekly throughout study period            Evaluation staff
                                  Adaptation requests                                   Annually/ad hoc                           Grantee staff
Substantial unplanned
adaptation(s)                 Interviews with CBO and grantee staff                     Once per year for two years               Evaluation staff
CSL = Community Service Learning.


Abt Associates                                                                                                                            March 27, 2015 ▌36
                                                                                                                                                 APPENDICES

Appendix C: Study Sample Flow
Table C.1. Cluster and youth sample sizes by intervention status

                                              Time period          Total        Intervention       Control           Total        Intervention        Control
                                                                sample size     sample size      sample size       response         response         response
                                                                                                                      rate            rate %           rate %

Number of Clusters (teachers)

1. At beginning of study                                            63              36               27

2. Contributed at least one youth at         Baseline               61              35               26              96.8             97.2            96.3
baseline

3. Contributed at least one youth at         3 months post-         61              35               26              96.8             97.2            96.3
short-term follow-up                         programming

4. Contributed at least one youth at         15 months post-        61              35               26              96.8             97.2            96.3
long-term follow-up                          programming

Number of Youth
5. In non-attrition clusters/sites at time
                                                                  2,325            1,461            864
of baseline survey


6. Who consented and assented                                     1,644            1,016            628              70.7             69.5            72.7

7. Contributed a baseline survey             Baseline             1,580             972             608          96.1 (68.0)      95.7 (66.5)      96.8 (70.4)

                               &lt;&lt; Parents and students notified of random assignment status after baseline survey administration &gt;&gt;

8. Contributed a short-term follow-up        3 months post-       1,223             763             460          74.4 (52.6)      75.1 (52.2)      73.2 (53.2)
response to primary outcome                  programming

9. Contributed a long-term follow-up         15 months post-      1,196             751             445          72.7 (51.4)      73.9 (51.4)      70.9 (51.5)
response to secondary outcome                programming
Notes: Nine teachers (6 treatment, 3 control) from the 2011–2012 cohort remained eligible in 2012–2013 and retained their random assignment status across both
study cohorts. Students were enrolled into these teachers’ classes according to standard school procedures without regard to the teachers’ study group status.
The numbers in row 5 reflect the total number of students enrolled in study teachers’ classrooms at the time of the baseline survey, including those who were not


Abt Associates                                                                                                                                   June 26, 2015 ▌37
                                                                                                                                                 APPENDICES
eligible for the study due to lack of parental consent (681 students). Parents and students were blind to the random assignment status of teachers until after
baseline survey administration. In rows 7, 8, and 9 the percentages in parentheses reflect the response rates when non-consented youth are included in the
denominator.


Abt Associates                                                                                                                                  June 26, 2015 ▌38
                                                                                                                                                APPENDICES

Appendix D: Implementation Study Methods
Table D.1. Methods used to address implementation research questions

     Implementation element                                          Methods used to address each implementation element

 Adherence
                                                                                                  ®
 How many sessions were offered?    The median number of weekly sessions offered across TOP clubs captured in the PMRS.
 How often were sessions offered?                                                                            ®
                                    Median session duration: the median class period length in which TOP was placed, measured in minutes.
                                                                                                                                                              ®
                                    Average duration of program: the average number of consecutive months in which sessions were offered across TOP
                                    classes.
                                    Median of the number of sessions each treatment group student attended.
 What and how much was
 received?                          Percentage of students completing 25 or more sessions: the number of students attending 25 or more sessions divided by
                                    the total number of students in the treatment group.

                                    Median number of CSL hours that each treatment group student completed.

                                    Percentage of students completing 20 or more CSL hours: the number of students completing 20 or more CSL hours
                                    divided by the total number of students in the treatment group.
                                                                                                 ®
                                    Consistent facilitator for nine months: the percentage of TOP classes that had at least one trained facilitator retained for
 Who delivered material to
                                    the program’s full nine month duration.
 students?
                                    The ratio of trained facilitators to students: divide the number of students by the number of trained facilitators. Report the
                                                        ®
                                    percentage of TOP classes that meet the minimum ratio of 1:25.

                                    Count of all staff trained staff members implementing for SY 2011–2012 and 2012–2013.
 Quality
                                    The percentage of treatment group students reporting that the program was delivered by caring and understanding CBO
 Quality of CBO staff-participant
                                    facilitator, in a safe environment, in a values-neutral way.
 interactions

 Quality of youth engagement with   The percentage of treatment group students reporting that the program was youth-driven and engaging.
 program

 Counterfactual

 Experiences of counterfactual      The data on experiences of the counterfactual at follow-up will be presented as means and percentages.
 condition


Abt Associates                                                                                                                                  June 26, 2015 ▌39
                                                                                                                                        APPENDICES

     Implementation element                                        Methods used to address each implementation element

 Context

 Other TPP programming available    All of the TPP-related programming available to both intervention and comparison groups described by program and school
 or offered to study participants   staff is grouped into categories; the number of schools falling into each category is reported.
 (both intervention and
 counterfactual)

 External events affecting          Any external events affecting implementation are reported.
 implementation
                                    The approved adaptation from nine months to eight months program duration is described.
 Substantial unplanned
 adaptation(s)
PMRS = Performance measure reporting system.
CSL = Community Service Learning.
CBO = Community-based Organization.
TPP = Teen Pregnancy Prevention.


Abt Associates                                                                                                                         June 26, 2015 ▌40
                                                                                                                                            APPENDICES

Appendix E: Summary of Sensitivity Tests
           To test whether the results presented in the report were sensitive to researcher decisions about how data were cleaned and analyzed, we

conducted four sensitivity analyses. Table E.1 provides an overview of the components of each analysis. All approaches account for two design

effects: the clustering of students within teachers’ classes and for the randomization of teachers within schools or a group of schools. Each

sensitivity analysis tests the robustness of an individual component of our benchmark approach. The first sensitivity analysis tests whether a

logistic regression model produces comparable results to the linear probability model. The second sensitivity analysis mirrors the benchmark

approach with the exception that we did not apply non-response weights to account for missing outcome data. This sensitivity analysis examines

whether the impact estimates for the un-weighted analytic sample are comparable to the impact estimates that are “weighted-up” to the full

baseline sample.26 The third analysis tests whether the benchmark findings are replicated when inconsistent responses between baseline and

follow-up surveys are set to missing. The final analysis assesses the effect of including student-level baseline covariates in the model. While

including baseline covariates in the impact model is standard practice, there is some debate about the effects of doing so (Schochet, 2010).


Table E.1. Overview of sensitivity analyses

                                                    Benchmark               Sensitivity           Sensitivity            Sensitivity              Sensitivity
                                                     analysis               analysis 1            analysis 2             analysis 3               analysis 4
           Linear probability model

                                                                            Logistic                                                                


26
     Non-response weights give more weight to respondents who are underrepresented in the analytic sample compared to the full baseline sample.


Abt Associates                                                                                                                              June 26, 2015 ▌41
                                                                                                                                                  APPENDICES

                                                       Benchmark               Sensitivity            Sensitivity             Sensitivity            Sensitivity
                                                        analysis               analysis 1             analysis 2              analysis 3             analysis 4

              Non-response weights
                                                                                                    Unweighted                                          


 Inconsistent responses between surveys                                                                                     Set inconsistent
                left “as-is”
                                                                                                                          responses to                  
                                                                                                                                missing

        Student-level baseline covariates

                                                                                                                                                  No student-level
                                                                                                          
                                                                                                                                                     covariates

         Adjustments for clustering and
             randomization blocks
                                                                                                                                                        
                                                             


            Table E.2 presents the findings from the sensitivity analyses conducted on the primary research question, followed by the secondary

research questions in Table E.3. For all outcomes, the results do not depart significantly from those produced by the benchmark analyses presented

in the main body of the report. 27


 27
      An additional sensitivity test (not shown) was run excluding Cohort 2 students who enrolled in the classes of nine teachers (6 treatment and 3 control) who
       kept their random assignment status from Cohort 1. The results did not differ substantively from those produced by the benchmark analysis.


 Abt Associates                                                                                                                                  June 26, 2015 ▌42
                                                                                                                                                 APPENDICES

Table E.2. Estimated effects using data from short-term follow-up to address the primary research question

                             Benchmark analysis                  Logistic             Un-weighted                 Set inconsistent             No student-level
                                                                                                               responses to missing              covariates

                           Diff. (SE)      p-value       Odds        p-value      Diff. (SE)     p-value      Diff. (SE)     p-value         Diff. (SE)     p-value
                                                         Ratio

 Recently sexually            -.01            .68          .91          .66       -.01 (.020)      .47           .004           .85        -.001 (.036)        .78
      active                 (.026)                                                                             (.023)

Notes: The benchmark approach used: the linear probability model for dichotomous outcomes, non-response weights created with the propensity score
stratification method, inconsistent responses between baseline and follow-up left “as-is,” student-level baseline covariates, and adjustments for clustering and
randomization strata.

Table E.3. Estimated effects using data from short-term and long-term follow-up to address secondary research questions

                             Benchmark analysis                  Logistic                Un-weighted               Set inconsistent           No student-level
                                                                                                                responses to missing            covariates

                             Diff. (SE)     p-value       Odds          p-value     Diff. (SE)     p-value      Diff. (SE)    p-value      Diff. (SE)     p-value
                                                          Ratio

Recent unprotected             -.031          .31          .729             .39       -.030          .27         -.023           .45         -.029          .36
sex (short-term)               (.030)                                                 (.027)                     (.031)                      (.032)

Recent unprotected             -.003          .90          .757             .31       -.004          .86          .017           .56         -.007          .81
sex (long-term)                (.028)                                                 (.023)                     (.029)                      (.028)

Recently sexually              -.023          .48          .755             .16       -.014          .63          .006           .84         -.027          .55
active (long-term)             (.033)                                                 (.028)                     (.028)                      (.045)

Ever had sex (short-            .024          .33          1.22             .67        .015          .53          .024           .33          .011          .64
term)                          (.024)                                                 (.024)                     (.024)                      (.024)
Subgroup: sexually
inexperienced at
baseline

Ever had sex (long-             .052          .16          1.32             .23        .033          .35          .056           .14          .032          .38
term)                          (.037)                                                 (.035)                     (.037)                      (.036)


Abt Associates                                                                                                                                   June 26, 2015 ▌43
                                                                                                                          APPENDICES

                       Benchmark analysis          Logistic              Un-weighted           Set inconsistent        No student-level
                                                                                            responses to missing         covariates

                       Diff. (SE)    p-value   Odds      p-value    Diff. (SE)    p-value   Diff. (SE)    p-value   Diff. (SE)   p-value
                                               Ratio
Subgroup: sexually
inexperienced at
baseline

Recently sexually     -.008 (.043)     .85     .902           .75     -.019         .57     .015 (.041)     .72      -.018         .60
active (short-term)                                                   (.033)                                         (.035)
Subgroup: females

Recently sexually     -.015 (.045)     .75     .923           .82     -.007         .84       -.010         .85      -.006         .88
active (short-term)                                                   (.036)                  (.050)                 (.039)
Subgroup: males

Recently sexually     -.041 (.047)     .39     .676           .18     -.025         .58       -.008         .86       -.028        .58
active (long-term)                                                    (.044)                  (.043)                  (.05)
Subgroup: females

Recently sexually     -.023 (.047)     .63     .788           .43     -.015         .64     .035 (.048)     .46      -.018         .68
active (long-term)                                                    (.033)                                         (.044)
Subgroup: males

Recent unprotected    -.042 (.034)     .22     .422           .06     -.040         .15       -.035         .32      -.044         .12
sex (short-term)                                                      (.028)                  (.035)                 (.029)
Subgroup: females

Recent unprotected    -.021 (.032)     .51     .875           .81     -.005         .87       -.018         .58      -.006         .85
sex (short-term)                                                      (.028)                  (.032)                 (.030)
Subgroup: males

Recent unprotected    .015 (.036)      .68     .882           .77   .010 (.028)     .73     .027 (.035)     .44       .003         .91
sex (long-term)                                                                                                      (.026)
Subgroup: females

Recent unprotected    -.028 (.047)     .56     .544           .16     -.022         .56       -.002         .97      -.032         .40
sex (long-term)                                                       (.038)                  (.061)                 (.038)
Subgroup: males


Abt Associates                                                                                                            June 26, 2015 ▌44
                                                                                                 APPENDICES

Appendix F: Equation for Estimating Baseline Equivalence
        The following model was used to test for treatment-control differences on the baseline

value of each outcome measure for the primary and secondary research questions, as well as for the

following baseline demographic measures: age, sex, race/ethnicity, and sexual experience. We used a

multilevel model to account for the clustering of students with teachers and dummy variables to


                            𝑌𝑌𝑖𝑖 𝑖𝑖 = 𝛽𝛽0𝑖𝑖 + 𝜀𝜀 𝑖𝑖 𝑖𝑖
account for the randomization of teachers within school blocks.


                            𝛽𝛽0𝑖𝑖 = 𝛾𝛾0 + 𝛾𝛾1 𝑇𝑇𝑖𝑖 + ∑ 𝑀𝑀 𝛾𝛾 𝑚𝑚 𝐷𝐷 𝑚𝑚𝑖𝑖 + 𝜇𝜇 𝑖𝑖
        (1) Level 1:
        (2) Level 2:                                   𝑚𝑚=1


        At level 1 (individual level):
        Yij is the baseline demographic or behavioral measure for student i in cluster j.
        β0j is the mean value of the baseline measure in cluster j.
        εij is the residual error for student i in cluster j, which is assumed to be independently and identically
            distributed.


        At level 2 (level of randomization):
        γ0 is the global mean of the baseline measure.
        γ1 is the coefficient of interest, which represents the estimated difference between the treatment and control
            groups.
        Tj is a dummy variable equal to 1 if teacher j was assigned to the treatment group.
        Dmj are dummy variables representing the randomization strata.
        µj is the residual error for teacher j, which is assumed to be independently and identically distributed.


Abt Associates                                                                                 June 26, 2015 ▌45
                                                                                            APPENDICES

Appendix G: Impact Model Specification
Impact models for primary and secondary research questions

    Individual outcomes are modeled at level 1, while level 2 represents the level of cluster


                           𝑌𝑌𝑖𝑖 𝑖𝑖 = 𝛽𝛽0𝑖𝑖 + ∑ 𝑘𝑘=1 𝛽𝛽 𝑘𝑘𝑖𝑖 𝑖𝑖 𝑋𝑋 𝑘𝑘𝑖𝑖 𝑖𝑖 + 𝜀𝜀 𝑖𝑖 𝑖𝑖
randomization (teachers).
                                               𝐾𝐾
        (1) Level 1:

        (2) Level 2:       𝛽𝛽0𝑖𝑖 = 𝛾𝛾0 + 𝛾𝛾1 𝑇𝑇𝑖𝑖 + ∑ 𝑀𝑀 𝛾𝛾 𝑚𝑚 𝐷𝐷 𝑚𝑚𝑖𝑖 + 𝜇𝜇 𝑖𝑖
                                                      𝑚𝑚=1


        At level 1 (individual level):


        Yij is the outcome of interest for student i in cluster j.

        β0j is the mean value of the outcome measure in cluster j.

        β kij is the estimated coefficient for the kth baseline characteristic for student i in cluster j.

        Xkij is the kth baseline characteristic for student i in cluster j (e.g., =1 for female).

        εij is the residual error for student i in cluster j, which is assumed to be independently and

             identically distributed.


        At level 2 (level of randomization):

        γ0 is the global mean of the outcome measure.

        γ1 is the coefficient of interest, which represents the estimated impact of treatment.

        Tj is a dummy variable equal to 1 if teacher j was assigned to the treatment group.

        Dmj are dummy variables representing the randomization strata.

        µj is the residual error for teacher j, which is assumed to be independently and identically

        distributed.


        The coefficient on the treatment variable, γ1, is the primary coefficient of interest. We test

whether the estimate of this coefficient is statistically significant at the 5 percent level using a two-

tailed test. If the estimated coefficient is statistically significant, we interpret this as evidence that


Abt Associates                                                                            June 26, 2015 ▌46
                                                                                                             APPENDICES

offering TOP® affected the outcome. If the estimated coefficient is not statistically significant, we

conclude that there is no evidence that offering TOP® affected the outcome.

Subgroup impact model for secondary research questions about male-female differences


               (3) Yij = β0j + β 1Tj + ∑ 𝑘𝑘=1 𝛽𝛽 𝑘𝑘𝑖𝑖 𝑖𝑖 𝑋𝑋 𝑘𝑘𝑖𝑖 𝑖𝑖 + 𝛾𝛾 𝑚𝑚 𝐷𝐷 𝑚𝑚𝑖𝑖 +γkTj Xkij + μj + εij
        The following regression model tests for subgroup differences for the secondary outcomes.
                                         𝐾𝐾


        Most of the terms in Equation (3) are equivalent to those in Equations (1) and (2). The main

changes are:

          β 1 is the estimated average impact for the reference category of the subgroup (e.g., female).


          γk tests whether there is a differential impact of the treatment between the two categories of

                     the subgroup (e.g., male or female).


          β 1+ γk    is the estimated impact for the other category in the subgroup (e.g., male).


Abt Associates                                                                                              June 26, 2015 ▌47
                                                                                            APPENDICES

Appendix H: Non-Response Weights
         To account for missing outcome data on the primary and secondary research questions, we

created weights for each respondent using the propensity score stratification method (see Puma et al.,

2009). We fit the same impact models as we originally specified (for the complete cases), and

applied the weights to the data using the weight statement in SAS PROC MIXED. This approach

gives more weight to respondents who are underrepresented in the analytic sample compared to

the full baseline sample.

         The steps used to calculate the weights under this approach were as follows:

     1. Divide sample members into four groups based on their study group status (treatment or

         control) and presence of baseline data (i.e., has baseline data, does not have baseline data

         due to survey non-response): (1) treatment-has baseline, (2) treatment-no baseline, (3)

         control-has baseline, (4) control-no baseline.

     2. For the groups with no baseline data (Groups 2 and 4), compute the average response rate

         within the group (between 0 and 1) and set the weight for each student to the inverse of

         the average response rate for all students in that group. 28 This creates two weights: one

         for Group 2: treatment-no baseline (w TNB) and one for Group 4: control-no baseline (w

         CNB)

     3. For each study group that had baseline data (Groups 1 and 3), estimate a single-level logit

         model of response propensity as a function of (1) dummy variables for the teacher

         clusters, (2) demographics, and (3) other baseline measures that are plausibly expected to

         affect the likelihood of response. To account for missing baseline covariates (due to item


28
     We did not estimate response probabilities for students in these two groups using a logit model of response
     because they did not have any baseline data to explain their probability of follow-up response.


Abt Associates                                                                             June 26, 2015 ▌48
                                                                                                  APPENDICES

         non-response), apply the dummy variable method. 29 See Table H.1 below for a

         description of the covariates used in each model.


         Yi is the response probability for student i.
         β0 is the estimated intercept.
         Dtj are dummy variables representing the teacher cluster to which student i belongs.
         γt are the estimated coefficients for the tth teacher cluster.
         β ki is the estimated coefficient for the kth baseline characteristic for student i.
         Xki is the kth baseline characteristic for student i (e.g., =1 for female).
         εij is the residual error for student i, which is assumed to be independently and identically distributed.


     4. Compute estimated response probabilities for each student in Groups 1 and 3.

     5. Within each group, divide the sample—including both respondents and non-

         respondents—into quintiles based on their estimated survey response probabilities.

     6. Compute the average response rate (between 0 and 1) for each quintile.

     7. Set the weight wij for each student to the inverse of the response rate for all students in the

         same quintile. This creates 10 different weights: 5 weights for Group 1 (w T1, w T2, w T3, w

         T4, and w T5 ) and 5 weights for Group 3 (w 1 C, w 2C, w 3c, w 4 C, and w 5C ).

     8. Scale the weights so that the sum of weights equals the total sample size (N=1644). 30


29
      If two dummy variable indicators of missing data were highly collinear, we removed one of the variables
     from the model. Sensitivity tests demonstrated that excluding these dummy variables from the model did
     not significantly change the estimated response probabilities.
30
     The weights for the short-term term follow-up ranged from 0.70 (min) to 3.66 (max), a mean of 0.89 and
     median 0.78. At the long-term follow-up, weights ranged from 0.66 (min) to 4.59 (max), with a mean of
     0.86 and median 0.76.


Abt Associates                                                                                  June 26, 2015 ▌49
                                                                                             APPENDICES

Table H1. Baseline covariates used in logit models of response probability

     Baseline Covariate                                          Description
Cohort                         1=Cohort 1, 2= Cohort 2
Teacher                        Teacher ID
Age                            Student’s age at baseline
Female                         1=Yes, 0=No
Hispanic                       1=Yes, 0=No
Asian                          1=Yes, 0=No
Black                          1=Yes, 0=No
White                          1=Yes, 0=No
Other                          1=Yes, 0=No
FRP lunch                      Student gets FRP lunch: 1=Yes, 0=No
Whole life                     Student has lived in U.S. for whole life: 1=Yes, 0=No
English only                   Student speaks only English at home: 1=Yes, 0=No
Parents’ education             Student’s parents have at least some college experience: 1=Yes, 0=No
School attachment              Mean scale of 3 items
School engagement              Mean scale of 2 items
School performance             1=Mostly As, 2=Mostly Bs,3=Mostly Cs, 4=Mostly Ds, 5=Mostly Fs, 6=I don't
                               get letter grades
Participate in pro-social      Participate in pro-social activities at least 3 or more days per week: 1=Yes,
activities                     0=No
Civic awareness                Mean scale of 3 items
Civic efficacy: planning       Ability to plan: mean scale of 5 items
Civic efficacy: action         Ability to take action: mean scale of 4 items
General self-efficacy          Mean scale of 3 items
Trust others                   Mean scale of 3 items
Ever had sex                   1=Yes, 0=No
Sex in past 3 months           1=Yes, 0=No
Unprotected sex in past 3      1=Yes, 0=No
months
Ever been pregnant             1=Yes, 0=No
Intend to have sex next year   1 = No, definitely not
                               2 = No, probably not
                               3 = Yes, probably will
                               4 = Yes, definitely will
Intend to use condom           1 = No, definitely not
                               2 = No, probably not
                               3 = Yes, probably will
                               4 = Yes, definitely will


Abt Associates                                                                             June 26, 2015 ▌50
                                                                                        APPENDICES

Appendix I: Approaches to Inconsistent Survey Responses
        There were two types of inconsistent data encountered during data preparation: inconsistent

responses within the baseline survey and inconsistent responses across the baseline and follow-up

surveys. There were no inconsistent responses within each follow-up survey because all follow-up

surveys were administered online and the skip patterns were programmed to eliminate the possibility

of inconsistent responses.

        The baseline survey, on the other hand, was administered online (with pre-programmed skip

patterns) and on paper (where it was possible for participants to provide inconsistent responses). In

total, less than 1 percent of the baseline sample (N=1,644) provided inconsistent responses within the

baseline survey alone. To address these inconsistencies, we accepted the response to the gateway

question as “correct” and set the follow-up response to missing. For example, if a respondent reported

“Yes, I’ve had sexual intercourse in the past 3 months” and then reported “I’ve had sexual intercourse

zero times in the past 3 months,” we accepted the response to the gateway question as “correct” and

set the follow-up response to missing. The justification for this approach is that if the participant had

been taking the survey online, he/she would not have had the opportunity to provide an inconsistent

response to the follow-up question (i.e., in the example above, the online survey would not have

accepted “zero” as a valid response). While this does not guarantee that the gateway question is

“correct,” we expected a similar number of inconsistent responses across treatment and control

groups before the intervention.

        Across the three survey waves, five percent of the baseline sample (N=1644) responded

inconsistently about whether they had ever had sex. For these discrepancies, the benchmark approach

was to leave the inconsistent responses “as-is” since the correct response was unknown. To test the

robustness of this benchmark approach, we compared it to a more conservative approach of setting

the inconsistent responses across survey waves to missing. Please see Appendix E for the summary of

those analyses.


Abt Associates                                                                        June 26, 2015 ▌51
                                                                                              APPENDICES

Appendix J: Prevalence of Missing Baseline Covariates
Table J.1. Prevalence of missing data for baseline covariates

         Baseline covariate                % missing           % missing           % missing
                                                                     ®
                                             Total               TOP                Control
                                           (N=1,644)           (n=1,016)            (n=628)

Sex                                            4.6                 5.0                 4.0
Age                                            5.5                 5.7                 5.3
Race/ethnicity                                 5.5                 5.5                 5.4
Ever had sex                                   6.6                 7.2                 5.6
Recently sexually active                       6.8                 7.4                 5.7
Recent unprotected sex                         7.1                 7.9                 5.9
Notes: Includes both survey and item non-response. 3.9 percent of the total sample did not complete a baseline
survey (4.3% of treatment group, 3.2% of control group).


Abt Associates                                                                               June 26, 2015 ▌52
                                                                                          APPENDICES

Appendix K: Receipt of Sexual Health Information at Follow-Up
Table K.1. Percentage of participants who self-reported receiving sexual health information in the last 12
months, by treatment status

                                                   Short-term Follow-up           Long-term Follow-up
                                                        ®                             ®
Sexual health information topic                     TOP           Control         TOP           Control

Relationships and dating                            85%             76%            82%            82%

Marriage/family life                                71%             63%            70%            70%

Abstinence                                          79%             67%            77%            73%

Birth control methods                               68%             53%            74%            70%

Where to get birth control                          62%             44%            70%            65%

STDs                                                81%             65%            80%            77%

HIV/AIDs                                            79%             67%            79%            76%

How to talk to partner about sex                    56%             45%            66%            63%

How to talk to partner about birth control          53%             40%            65%            59%

How to say no to sex                                77%             66%            78%            75%

Reproduction                                        82%             75%            81%            80%
Source: Follow-up surveys administered 3 and 15 months post-programming.


Abt Associates                                                                        June 26, 2015 ▌53