Health Services Research
and Program Evaluation: Causal Inference and Estimation
Marcelo Coca Perraillon,
Rich Lindrooth,
and Donald Hedeker
This site is for a forthcoming book on statistical/econometric methods for health services research and program/policy evaluation to be published by Cambridge University Press around 2024. We'll post about 2 chapters on this site.
Some lecture
notes that are the basis for this book are available here (Rich’s lecture notes
are not online):
-
Perraillon’s
site at the University of Colorado Anschutz Medical Campus
-
Hedeker’s site at the
University of Chicago
NEW: Draft of Chapter 6 on maginal effects availabe. Comments welcome (citation suggestion at the bottom of the page).
Table of Contents
Preface
Part I: Prologue
Chapter 1: Overview
Part II: Foundations
Chapter 2: Basic statistics, linear regression, and nonparametric
models
Chapter 3: The potential outcomes framework
Part III: Estimators and interpretation
Chapter 4: Estimation I: Maximum likelihood, Generalized method of moments,
Bayesian estimation
Chapter 5: Estimation II: Variance and Standard errors
Chapter 6: Marginal effects to interpret regression models
Chapter 7: Generalized linear models (GLM)
Chapter 8: Discrete Choice Models
Part IV: Causal inference with observational data
Chapter 9: Alternatives to regression adjustment: Matching estimators and
propensity scores
Chapter 10: Longitudinal (panel data): fixed effects and random effects models
Chapter 11: Difference-in-differences
Chapter 12: Regression discontinuity design
Chapter 13: Instrumental variables
Appendix: Mathematical concepts
A description of the book (from
Preface)
This is a book on quantitative methods in
health services research, health economics, and health policy evaluation --
more generally referred to as “program evaluation.” Health services research is
a multidisciplinary field that examines the use, costs, quality, outcomes, and
other aspects of health care including the organization of health care markets.
Evaluating the impact of health policy is central to the field.
Quantitative analyses in health services
research apply methods and language developed in econometrics and
statistics/biostatistics. In most applications, the goal is to understand the
causal impact of policy changes or “treatments,” broadly defined, on a set of
outcomes. In most circumstances, however, randomized trials are either not
feasible or prohibitively expensive, and we must establish causality using
observational data; that is, data that were not collected as part of an
experiment. A key distinction between experiments and observational studies is
than in observational studies treatment assignment is not under the control of
the investigator.
Most readers have already learned that
correlation or association does not imply causation. The goal of causal
inference is to understand under which conditions correlation --or any other
measure of association-- does imply a causal effect. Thus, this book is about
the design of observational studies and the estimation of statistical models to
answer causal research questions. We also cover the necessary background
material to understand advanced methods. The background material is focused on
understanding the mechanics and properties of parametric and nonparametric
statistical models. These models are useful as descriptive and predictive
tools, but our ultimate goal is to use them to answer
causal research questions.
One feature of our book is that we separate
the design of an observational study from the estimation of statistical models.
The separation of design and estimation is one of the most valuable aspects of
the potential outcomes framework since causal effects are defined independently
of an estimation method. This approach is part of the ``new" causal
inference field in statistics, although causal inference has always been
central to econometrics. In the last two to three decades, these separate but
related fields have found plenty of common ground regarding causality. The new
part is a clear definition of causal effects and a mathematical notation based
on potential outcomes and counterfactuals that continues to expand and clarify
our understanding of established methods and facilitates the development of new
ones.
Our approach is based on the premise that
complex concepts are better understood when first introduced with intuitive
examples and graphs, followed by theory, and then practical applications using
statistical software. Based on our experience teaching graduate-level classes,
we think that students learn best by doing, and ``doing" means relating
the theory to application using statistical software.
We strive to present theory intuitively but
formally to show how the theory is applied and why methods work,
which is essential for understanding when specific methods should be
used and what meaning can be derived from the estimators. This is not
a “cookbook approach” book in the sense that we do not focus on rules for
specific situations. We do not shy away from presenting complex concepts
and mathematical notation because they are essential tools to develop intuition
on how and why statistical methods work. Mathematics is a language that makes
the job easier, not more difficult. Mathematics allows us to represent
ideas and concepts using symbols, and we manipulate these symbols to discover
new ideas and prove propositions that might not be self-evident. Manipulating
complex ideas in our minds without the use of symbols is much more difficult.
However, we always provide the intuition behind the mathematics to help
students understand how the symbols relate to ideas since not all students are
comfortable with mathematics. At the end of the course(s), students should be
able to understand the language of mathematics as it applies to statistical
analysis.
This book is intended for advanced
undergraduates, master’s students, and doctoral students in health services
research, health economics, public policy, and related fields. Students in
these disciplines come from diverse backgrounds with different levels of
preparation. We assume the same background that is commonly required for
admission to these programs: two semesters of calculus and introductory
statistics. A class on linear regression would be helpful, but not strictly
necessary since we review the essential features of linear models. The goal of
the mathematical appendix is to review the mathematical background needed to
understand the rest of the book. We hope that students do not skip the
introductory material. Each new method is based on previous concepts, and it is
a lack of mastering the basics (and the notation) that tends to trick students.
Some previous knowledge of Stata is helpful, although the background chapters
also serve as an introduction to Stata.
Key features of this book include:
·
Semantics Boxes that clarify how terms are
used in different disciplines. Because our field is multidisciplinary, the
terms we use can be confusing --sometimes comically so-- because the same terms
can have different definitions or because the same concept is named differently
in other fields.
·
Notation Boxes that clarify how mathematical
symbols are used in different disciplines or by different authors. As we said,
mathematics is a language, but it is a language with symbols that are not
standard and can be defined in different ways by different authors. We clarify
mathematical notation because notation can prevent students and practitioners
from understanding the underlying concepts. A variant of this theme is that
sometimes the notation is the result of giving statistical models an
interpretation tied to an underlying theory, so we also cover different ways of
understanding and/or deriving statistical models. We think students will be
better equipped to understand theoretical papers and more advanced textbooks if
they understand the notation.
·
Extensive examples using datasets to
illustrate real-life applications. One frustrating aspect of teaching health
services research methods is that we usually cannot use the same datasets that
are common in the field and our own research because Data Use Agreements do not
permit the distribution of these data. However, we have created multiple
datasets from publicly available sources and include datasets that authors have
made publicly available to reproduce published papers. Our goal is to use
datasets that reflect how practitioners work in our fields.
·
Stata code to reproduce all examples and
figures in the book. Some concepts are difficult to understand in theory but
are relatively easy to understand when implemented in practice (and vice
versa). We use Stata code as a tool for learning. In some cases, like graphs or
long output, not all of the code is in the book, but
it is available in the online supplemental material.
·
Stata version control. We prefer Stata because
it has the features we need and it has extensive
documentation and substantial technical support. These are among the reasons it
is the standard statistical software in our field. Another key feature of Stata
is that it is backwards compatible. Regardless of updates, commands will always
work provided the code includes a Stata version statement. This ensures that
our code will not become obsolete when new versions are released
or commands are updated. Most our code requires Stata 16.1, but some examples
require Stata 17. Each program file begins with a version statement.
·
Online supplemental material. The online
supplemental material includes R and SAS code to replicate most of the examples
in the book when possible, although some material is
specific to Stata.
·
End-of-chapter exercises to reinforce key
concepts.
·
End-of-chapter bibliographical notes with
references to books and papers where readers can find additional or
complementary material.
This book is also intended to be a tool for
faculty who teach quantitative methods and a reference for practitioners. We
wrote it because we could not find a textbook that fit the needs of students.
In our classes, we ended up assigning book chapters and papers that use
different notation and language, which makes both learning and teaching more
difficult. We had to complement those materials with extensive lecture notes
and ``translations" of notation, terms, and subject-matter. Our lecture
notes are the basis for this book.
Additional supplemental material for
instructors include:
·
Solutions to end-of-chapter exercises.
·
Most of the sample datasets contain additional
variables that are not part of our analyses. Instructors could use these
variables to expand problems sets or create examples focusing on different
research questions. In many cases, the variables have missing values. Most
textbooks use small sample datasets with non-missing values, but this does not
reflect the reality of how research is conducted, so we decided to retain
missing values in some of the datasets.
·
Lectures notes for most chapters. The lecture
notes focus on the most important parts of each chapter. These notes can be
used as a starting point for teaching with our book.
·
Errata. Despite multiple revisions and
editing, the presence of a mistake converges to 1 in probability given the
length of our book. We will post a complete list of errors by chapter as we
find them, including updates and clarification of some material.
We wrote the book with a two-semester
quantitative methods sequence in mind plus additional material for review. We
cover topics that should be the standard toolkit in health services research
and health/public policy doctoral programs as well as applied econometrics
courses in economics programs, although most of our examples are about health
care.
The book is divided into four parts. Parts I
and II introduce the major subjects we cover, including the potential outcomes
framework and a review of statistical concepts and linear regression. Part III
focuses on estimation and inference of statistical models, including
interpretation of model parameters (causal or not) and discussion of
nonparametric models. In other words, Part III discusses techniques to estimate
statistical models and the assumptions and properties of these models when
applied to a sample, without having to assume that findings from these
models have a causal interpretation. On the other hand, Part IV covers the
most important methods to estimate causal effects using observational data:
propensity scores and matching estimators as an alternative to regression
adjustment, longitudinal (panel) data, difference-in-differences, regression
discontinuity designs, and instrumental variables.
Two chapters are fundamental for students to
master: Chapter 3 on the potential outcomes framework
and Chapter 6 on marginal effects. Chapter 3 is the foundation to understand
the definition of causal effects and the identification of causal effects using
a sample, and it presents the potential outcome notation we use in the rest of
the book. Chapter 6 on marginal effects is essential for understanding the
interpretation of model parameters and to express model parameters in different
metrics regardless of whether the parameters have a causal interpretation. We
provide an overview of each chapter and their connections in Chapter 2.
We have tried to make the chapters as
self-contained (modular) as possible --particularly in Part IV-- so they can be
used independently, although this separation is artificial. We refer to other
material in the book when we think students would benefit from reading sections
in other chapters, but we tried to keep references to a minimum. Each chapter
progresses from simple to advanced, from known to unknown, and from concrete to
abstract without losing track of practical applications. Instructors could skip
the sections that appear towards the end of each chapter if they think the
material is too advanced for their students. However, we hope that all of the material is covered, time permitting. Often,
``advanced” really means ``unknown.” Most concepts are simple once we
understand them, and our understanding of ``sophisticated” changes with time.
What was a sophisticated method a decade ago could be a standard one now.
Book
citation:
Perraillon, Marcelo Coca, Richard Lindrooth, Donald Hedeker. Health Services
Research and Program Evaluation: Causal Inference and Estimation. Cambridge
University Press, forthcoming.
Or
Perraillon,
MC, Lindrooth, RM, Hedeker, D. Health Services Research and Program Evaluation:
Causal Inference and Estimation. Cambridge University Press,
forthcoming.
For the
additional material not in the textbook but available here:
Perraillon,
MC, Lindrooth, RM, Hedeker, D. Health Services Research and Program Evaluation:
Causal Inference and Estimation, Online Supplement. Cambridge University
Press, [insert date], https://www.perraillon.com/PLH.
© Perraillon, Lindrooth, Hedeker, 2021. No part of the materials available through the https://clas.ucdenver.edu/marcelo-perraillon/ and www.perraillon.com sites may be copied, photocopied, reproduced, translated or reduced to any electronic medium or machine-readable form, in whole or in part, without prior written consent of the author. Any other reproduction in any form without the permission of the author is prohibited. All materials contained on this site are protected by United States copyright law and may not be reproduced, distributed, transmitted, displayed, published, or broadcast without the prior written permission of author.