Code
Over the years, I have coded Stata commands and SAS macros that are still being used. Some are part of the teaching materials that I'm modifying and expanding for my forthcoming book (and adding R code). I started posting them online in graduate school. Since I still get questions about them, I still keep them around in case they are of use to somebody. I now use Stata, Python, and SAS for the SQL procedure.
Maximum likelihood estimation (MLE)
Examples of MLE estimation in Stata.
- Tobit using method lf
- Tobit using method d0
- Predictions for Tobit models
- Mixture of two normals
- Two Tobits
- Logit
- Probit
Finite Mixture Models (FMM)
See these slides for an example using NHANES 09-10 data . With a simple mixture of two normals and only one variable as predictor, 90 percent of observations are correctly classified by sex.
Zero-inflated Censored Normal(s) Mixtures: I used a mixture of two Tobit normals and a degenerate distribution with mass at zero to predict the EQ-5D preference score from the SF-12. See the paper and appendix for details:
Perraillon M, Ya-Chen Tina Shih, Ronald A. Thisted. "Predicting the EQ-5D Preference Index from the SF-12 Health Survey: A finite Mixture Approach.", Medical Decision Making, October 2015 vol. 35 no. 7 888-901.
- Stata command to implement the mixture model
- Appendix_A_v1.doc
- Appendix_B_v2.docx
- predict_eq5d.do (Stata code for predictions)
To install the package in Stata, type: ssc install zicen. You can also download simulated/synthetic Stata data to try zicen. You can also download the sample dataset by typing: net get zicen after installing the command zicen.
Stata -teffects- commmand
Stata treatment nonparametric estimation of treatment effects are implemented in the -teffects- command. It's a neat way of teaching nonparametric estimation of causal effects and issues of lack of overlap (common support) -- issues about regression adjustment in general, or the additional assumptions needed to obtain average treatment effects with regression adjustment. The idea behind teffects is that causal effects are nonparametrically identified, so they can also be estimated nonparametrically. The implementation can be confusing at first. Below is code to replicate -teffects- using regressions. This is part of Chapter 3 on potential outcomes in my book.
Regression discontinuity
Older version: Notes and Stata code from lectures at the University of Chicago. Warning: Some of the code is still helpful but I made many updates and changes over the years. See my Teaching page. If you see the same graphs and simulations in a certain book, they were used without my permission.
- Slides (2015 version)
- Stata do file
- Stata do file
Standardized differences
stdif.ado, tiny command to calculate standardized differences. Syntax is: stdif varname, by(varname) [categorical unequal]. The categorical option uses the binomial formula for the variances (varname variable must be 0/1 indicator). When the sample size is large the normal approximation works fine. Unequal uses the unequal variance option when calculating t-test statistics. Categorical variables are compared using a Chi2 test.Propensity scores
Note: I wrote the code below a long time ago. I don't use SAS that much these days. The code uses pointers in the data step to make matching easier since indexing data is not simple in SAS.
- PSmatching.sas macro based on my code written by William Thomas at the University of Minnesota. Steven Utke implemented a matching with replacement macro
- NESUG 2006 paper. Stuff from a previous life.
- Global Forum 2007 paper -- this paper includes code for global optimal matching. It minimizes the total distance among observations. Global matching doesn't work well with large datasets because it needs a matrix of distances between all pairs. Don't confuse optimal matching with full matching. Full matching is yet another way to do 1 to n matching (or n to 1) where n is not set a priori.