Marcelo Coca Perraillon

Book

Health Services Research and Program Evaluation: Causal Inference and Estimation

This page has information on my forthcoming book with Don Hedeker on statistical/econometric methods for health services research and program/policy evaluation to be published by Cambridge University Press around 2026. Under my agreement with the publisher, I'll post about 2-3 chapters on this site, but email me if you want earlier versions. Always happy to get feedback. The Preface below describes our approach and the book's features.

Everything in the book can be replicated in Stata and R. We chose Stata as the main package because Stata is the standard statistical software in health services research and health economics. Stata has version control so it ensures that the code will not become obsolete. It will always work in Stata. My lecture notes are on the Teaching page. Some of Don's lectures are here.

Citation

Perraillon, Marcelo Coca and Donald Hedeker. Health Services Research and Program Evaluation: Causal Inference and Estimation. Cambridge University Press, forthcoming.

Or:

Perraillon, MC, Hedeker, D. Health Services Research and Program Evaluation: Causal Inference and Estimation. Cambridge University Press, forthcoming.

For the additional material not in the textbook but available here:

Perraillon, MC, Hedeker, D. Health Services Research and Program Evaluation: Causal Inference and Estimation, Online supplement. Cambridge University Press, [insert date], https://www.perraillon.com/book.html

LaTex:

      @misc{phbook,
        title={Health Services Research and Program Evaluation: Causal Inference and Estimation, Online supplement},
        author={Perraillon, Marcelo and Hedeker, Donald},
        howpublished={\url{https://www.perraillon.com/book.html}},
        year={2025}, 
        organization={Cambridge University Press}
      }

Preface
Part I: Prologue: Chapter 1: Overview
Part II: Foundations: Chapter 2: Linear regression and inference; Chapter 3: The potential outcomes framework
Part III: Estimators and interpretation: Chapter 4: Estimation I: Maximum likelihood, Generalized method of moments, Bayesian estimation; Chapter 5: Estimation II: Variance and Standard errors; Chapter 6: Marginal effects to interpret regression models; Chapter 7: Generalized linear models and cost data (GLM)
Part IV: Causal inference with observational data: Chapter 8: Alternatives to regression adjustment: Matching estimators and propensity scores; Chapter 9: Longitudinal (panel data): fixed effects and random effects models; Chapter 10: Difference-in-differences; Chapter 11: Regression discontinuity designs; Chapter 12: Instrumental variables
Appnendix A: Mathematical concepts
Appnendix B: Predictions with machine learning and Python*

This chapter is an introduction to machine learning for those who have a more traidtional parametric statistics background (like OLS and maximum likelihood estimation instead of gradient descend).

Preface

Below is the first part of the Preface. Click here for the PDF version.

This is a book on quantitative methods in health services research, health economics, and health policy evaluation -- more generally referred to as "program evaluation." Health services research is a multidisciplinary field that examines the use, costs, quality, outcomes, and other aspects of health care including the organization of healthcare markets. Evaluating the impact of health policy is central to the field.

Quantitative analyses in health services research apply methods and language developed in econometrics and statistics or biostatistics. In most applications, the goal is to understand the causal impact of policy changes or "treatments," broadly defined, on a set of outcomes. In most circumstances, however, randomized trials are either not feasible or prohibitively expensive, and we must establish causality using observational data; that is, data that were not collected as part of an experiment. The main distinction between experiments and observational studies is that in observational studies treatment assignment is not under the control of the investigator. The consequence is that other factors besides treatment are not held constant so it is more difficult to establish causality.

Most readers have already learned that correlation or association does not imply causation. The goal of causal inference is to understand under which conditions correlation --or any other measure of association-- does imply a causal effect. Thus, this book is about the design of observational studies and the estimation of statistical models to answer causal research questions. Or said another way, under which circumstances an approach can identify causal effects. However, we also cover the necessary background to understand advanced methods. The background material is focused on understanding the mechanics and properties of parametric and nonparametric statistical models. These models are useful as descriptive and predictive tools, but our ultimate goal is to use them to answer causal research questions.

One feature of our book is that we separate the design of an observational study from the estimation of statistical models. The separation of design and estimation is one of the most valuable aspects of the potential outcomes framework since causal effects are defined independently of an estimation method. This approach is part of the "new" causal inference field in statistics, although causal inference has always been central to econometrics. In the last two to three decades, these separate but related fields have found plenty of common ground regarding causality. The new part is a clear definition of causal effects and a mathematical notation based on potential outcomes and counterfactuals that continues to expand and clarify our understanding of established methods and facilitates the development of new ones.

Our approach is based on the premise that complex concepts are better understood when first introduced with intuitive examples and graphs, followed by theory, and then practical applications using statistical software. Based on our experience teaching graduate-level classes, we think that students learn best by doing, and "doing" means relating the theory to application using statistical software. Some concepts are difficult to understand in theory but are relatively easy to understand when implemented in practice (and vice versa).

We strive to present theory intuitively but formally to show how the theory is applied and why methods work, which is essential for understanding when specific methods should be used and what meaning can be derived from the estimators. It is also the basis for understanding methods that still have not been developed. This is not a "cookbook approach" book in the sense that we do not focus on rules for specific situations because more often than not it is not possible to precisely spell out or anticipate the specific situation a rule requires. Instead, we focus on principles, concepts, and assumptions needed -- the how, why, when, and what-- which can then be evaluated in specific situations.

We do not shy away from presenting complex concepts and mathematical notation because they are essential tools to develop intuition on how and why statistical methods work. Mathematics is a language that makes the job easier, not more difficult. Mathematics allows us to represent ideas and concepts using symbols, and we manipulate these symbols to discover new ideas and prove propositions that might not be self-evident. Manipulating complex ideas in our minds without the use of symbols is much more difficult. However, we always provide the intuition behind the mathematics to help students understand how the symbols relate to ideas since not all students are comfortable with mathematics. At the end of the course(s), students should be able to understand the language of mathematics as it applies to statistical analysis. Our recommendation to students is to think of mathematics as a language. The first in step understanding a concept expressed with mathematics is to make sure the meaning of the mathematical notation is well understood.

This book is intended for advanced undergraduates, master's students, and doctoral students in health services research, health economics, public policy, public health, and related fields. Students in these disciplines come from diverse backgrounds with different levels of preparation. We assume the same background that is commonly required for admission to these programs: two semesters of calculus and introductory statistics. A class on linear regression would be helpful, but not strictly necessary since we review the essential features of linear models. We keep linear algebra to a minimum. The goal of the mathematical appendix is to review the mathematical background needed to understand the rest of the book. We hope that students go over the introductory material even if it is not assigned by instructors. Each new concept is based on previous concepts; it is a lack of knowledge of the basics, and the corresponding notation, that confuses students the most. Previous knowledge of Stata or R is helpful, although the background chapters also serve as an introduction to Stata, and the supplementary material reproduces the code using R.

Book

Citation

Table of Contents

Preface