Previous Masterclasses

Spring 2023

Title: Difference-in-Differences Methods
Date: February 9-10, 2023
Time: 9:30 am to 5:00 pm
Instructor: Professor Pedro H.C. Sant’Anna
Assistant Professor, Vanderbilt University (on leave)
Principal Researcher in the Office of the Chief Economist at Microsoft

Course Description

Difference-in-Differences (DiD) methods are widely used to investigate a range of empirical questions in economics, political science, and many other social and medical sciences. These methods are also very popular in industry due to the increasing interest in empirical investigations involving causal inference. There has recently been a large increase in the number of new papers on DiD, related methodologies making it challenging to keep up with rapidly evolving best practices. The main goal of this course is to provide up to date discussion of this important area.

A more detailed outline of the course can be found here.

Fall 2017

November 30-December 1, 2017
Giorgio Primiceri, Northwestern University
Bayesian Inference in Macroeconomic Models

This course is an introduction to modern time series econometrics, with an emphasis on Bayesian methods to conduct inference in dynamic macroeconomic models. The two main subjects are vector autoregressions (VARs) and dynamic stochastic general equilibrium (DSGE) models, but we will touch upon several other topics, such as state-space models, Monte Carlo methods, model comparison and model choice. The focus on VARs and DSGEs is motivated by the fact that these models are placed at opposite sides of the spectrum in terms of the economic restrictions that they impose on the dynamics of macroeconomic time series.

VARs are very popular and flexible tools used for forecasting and the identification of economic shocks, representing a bridge between reduced-form and structural models. However, their flexibility comes at the cost of being very heavily parameterized. As a consequence, Bayesian inference is crucial to handle the proliferation of parameters and to improve dramatically their forecasting performance and the estimation accuracy of more structural objects (e.g. impulse responses).

The term DSGE model encompasses a broad class of macroeconomic models that spans the standard neoclassical growth model as well as New Keynesian monetary models with numerous shocks, real and nominal frictions. A common feature of these models is that decision rules of economic agents are derived from assumptions about preferences and technologies. Therefore, the DSGE paradigm delivers empirical models with a strong degree of theoretical coherence that are attractive for business cycle analysis and as laboratories for policy experiments. Bayesian techniques are widely employed for the estimation of DSGEs: prior distributions are used to add non-sample information, and posterior distributions summarize the uncertainty about model features and can be efficiently evaluated with modern Bayesian computational tools.

The course is self-contained and does not assume prior knowledge of Bayesian inference. It is meant to be a gateway to the rapidly growing literature on modern macroeconometrics.

Summer 2017

June 5-6, 2017
Fedor Iskhakov (Australian National University), John Rust (Georgetown University), and Bertel Schjerning(University of Copenhagen)
Dynamic Programming–Theory, Computation and Empirical Applications

Dynamic programming (DP) is a fundamental tool in modern economics: it enables us to model decision-making over time and under uncertainty and is a general tool for modeling a wide range of phenomena, from individual retirement decisions to bidding in auctions, and price setting, investment, and financial decisions of firms. This course is focused on the empirical application of DP models and will discuss state-of-the-art methods for solving and simulating DP models and estimating them econometrically, with numerous actual empirical applications to illustrate how these tools and methods are used in practice. We will also discuss the formulation and solution of dynamic equilibrium models and dynamic games and provide state-of-the-art algorithms for finding equilibria and simulating and estimating such models. We will also discuss a growing line of research on behavioral models and ways to deal with some of the limitations of models of “full rationality” including the curse of dimensionality, the identification problem, and the problem of multiplicity of equilibria. 

The lecture will be presented by three leading contributors to this literature. John Rust is a Professor of Economics at Georgetown and has made important contributions to the literature on the estimation of dynamic discrete choice models. Rust was awarded the Frisch Medal (the highest award from the Econometric Society for the best empirical paper published in Econometrica in the preceding 5 year period for his paper, “Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher”) in 1992. Fedor Iskhakov is a Senior Lecturer in Economics at Australian National University and winner of His Majesty the King of Norway Golden Medal for best research in social sciences  by young researcher in Norway in 2008 for his PhD thesis, “A Dynamic Structural Analysis of Health and Retirement”.  Bertel Schjerning is a Professor of Economics at the University of Copenhagen and head of the Centre for Computational Economics, and is well known for his contributions to several fields including his 2014 paper in the Journal of Public Economics with Daniel le Maire, “Tax bunching, income shifting and self-employment”.  Iskhakov, Rust and Schjerning have collaborated on a number of projects at the frontiers of structural estimation and algorithmic game theory including their 2016 paper in the Review of Economic Studies, ” Recursive Lexicographical Search: Finding all Markov Perfect Equilibria of Finite State Directional Dynamic Games”.

The course is designed to be “self-contained” and there are no pre-requisites. It is targeted for graduate students who are interested in learning the tools and methods to formulate, solve, estimate and simulate dynamic models, as well as for policy makers who are interested in the current state of the art (as well as a frank discussion of the limitations) for these types of models in practical policy making.  Some knowledge of optimization, computer programming and microeconomic theory at the level of a 1st year PhD program in economics is useful but not required. The course will provide access to computer code and help in programming questions and questions on how to apply these types of models to problems that attendees in this short course may encounter in their own research and work.


May 8-9, 2017
Bryan S. Graham, UC Berkeley
Econometric Analysis of Network Data

This masterclass will provide an overview of econometric methods appropriate for the analysis of social and economic networks. Many social and economic activities are embedded in networks. Furthermore, datasets with natural graph theoretic (i.e., network) structure are increasingly available to researchers. We will review (i) how to describe, summarize and visually present network data and (ii) formal econometric models of network formation that admit heterogeneity, strategic behavior, and/or dynamics. The focus will be on the formal development of methods, but selected empirical examples will also be covered, as will methods of practical computation. 

FALL 2016

September 26-27, 2016
Victor Chernozhukov, MIT
Machine Learning for Treatment Effects and Structural Equation Models



The course provides a practical introduction to modern high-dimensional function fitting methods — a.k.a. machine learning methods  — for efficient estimation and inference on treatment effects and structural parameters in empirical economic models.  Participants will use R to allow them to immediately internalize and  use the techniques in their own academic and industry work.  All lectures, except the introductory one, will be accompanied by R-code that can be used to reproduce the empirical examples.  Thus, there will be no gap between theory and practice.


This is the 7th edition of the course to be given at Georgetown University. Previous editions were given at GSERM St. Gallen, CEMMAP, MIT in 2015 and 2016 as parts of the 14.387 course “Applied Econometrics” and in the Summer School of Econometrics of the Bank of Italy in Perugia. 


1. Causal Inference in Approximately Sparse Linear Structural Equations Models.

  • Approximately sparse econometric models as generalizations of conventional econometric models
  • “Double lasso” or “double partialling out” methods for efficient estimation and inference of causal parameters in these models.
  • Various empirical examples.
  • References: 3, 4.

2. Understanding of the Inference Strategy via the Double Partialling Out and Adaptivity.

  • Theory: Frisch-Waugh 3Partialling Out.  Adaptivity.
  • Laying a strategy for the use of non-sparse and generic ML methods.
  • R Practicum:  Mincer Equations, Barro-Lee, and Acemoglu-Johnson-Robinson examples.
  • References: 3,4, 6.

3. ML Methods for Prediction = Reduced Form Estimation.  Evaluation of ML Methods using Test Samples.

  • Penalization Regression Methods: Ridge, Lasso, Elastic Nets, etc.
  • Regression Trees, Random Forest, Boosted Trees.
  • Modern Nonlinear Regression via Neural Nets and Deep Learning
  • Aggregation and Cross-Breading of the ML methods.
  • R Practicum: Simulated, Wage, and Pricing Examples.
  • References:  1, 2, 9-11.­­ 

4. ML Methods for Causal Parameters — “Double” Machine Learning for Causal Parameters in Treatment Effect Models and Nonlinear Econometric Models

  • Using generic ML (beyond Lasso) to Estimate Coefficients in Partially Linear Methods
  • Using generic ML to estimate ATE, ATT, LATE in Heterogeneous Treatment Effect Models
  • Using generic ML methods to estimate structural parameters in Moment Condition problems.
  • R-practicum: 401(k) Example.
  • References:  5, 6, 7, 8.

5. Scalability:  Working with Large Data. MapReduce, Hadoop and all that

  • MapReduce, Sufficient Statistics, Linear Estimators
  • MapReduce and Computation of Nonlinear Estimatos via Distributed Gradient Descent
  • MapReduce in R. 


Please bring your computer to class. Install R and R-studio.  Install packages “hdm”, “glmnet”, “nnet”, “randomForest”, “rpart”, “rpart.plot”, “gbm” from cran (e.g. type install.packages(“gbm”)) If you are not familiar with R, try out  several introductory tutorials that are available online. Please read and understand the idea of cross-validation (k-fold cross-validation) to prevent overfitting, and bias and variance tradeoffs in nonparametric estimation.  I will be mentioning these briefly in class, but I will count on you understanding this background concepts.  A good reference is “Elements of Statistical Learning” which is available from Tibshirani’s website.


  • The Elements of Statistical Learning by T. Hastie, R. Tibshirani, and J. Freedman. The book can be downloaded for free! 
  • An Introduction to Statistical Learning with Applications in R, by G. James, D. Witten, T. Hastie and R. Tibshirani.  The website has a lot of handy resources. 
  • “High-Dimensional Methods and Inference on Treatment and Structural Effects in Economics, “J. Economic Perspectives 2014, Belloni et. al. Stata replication code is here.  R code implementation is in package “hdm”. 
  • Inference on Treatment Effects After Selection Amongst High-Dimensional Controls (with an Application to Abortion and Crime),”ArXiv 2011, The Review of Economic Studies, 2013, Belloni et. al. Stata and Matlab programs are here; replication files here.  R code implementation in package “hdm”. 
  • “Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations”, Arxiv 2013, Journal of Econometrics, 2015.  by M. Farrell. 
  • “Post-Selection and Post-Regularization Inference: An Elementary, General Approach,” Annual Review of Economics 2015, V. Chernozhukov, C. Hansen, and M. Spindler.  R code implementation in package “hdm”. 
  • “Program Evaluation and Causal Inference with High-Dimensional Data,”ArXiv 2013, Econometrica, 2016+,  A. Belloni et al. R code implementation in package “hdm”.  Replication files via Econometrica website. 
  • “Double Machine Learning for Causal and Treatment Effects”, MIT Working Paper,  V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey. 
  • “Big Data: New Tricks for Econometrics,” Journal of Economic Perspectives 2014, H. Varian. 
  • “Economics in the age of big data,” Science 2014, L. Einav, J. Levin. 
  • “Prediction Policy Problems,” American Economic Review P&P 2015, J. Kleinberg, J. Ludwig, S. Mullainathan, Z. Obermeyer. 


March 31-April 1, 2016
Charles Manski, Northwestern University
Kenneth Wolpin, Rice University
The Role of Theory and Uncertainty in Policy Evaluation

This masterclass brings together two highly distinguished economists to examine the role of theory and uncertainty in policy evaluation. It is organized in six sessions. The first session focuses on the use of descriptive statistics. The second session discusses partial identification of the treatment response model using the right-to-carry laws as a case study. The third session considers partial identification of a structural model. The fourth session, complementary to the third, examines ex ante policy evaluation using both parametric and nonparametric approaches. The fifth session discusses the policy use of discrete choice dynamic programming models and will cover both methodological issues and empirical applications. The final session is devoted to a discussion of the importance of (simple) theory in inferential empirical work. 

FALL 2015

December 7-8, 2015
Steven T. Berry
Yale University
Empirical Models of Differentiated Products

This course considers the identification and estimation of models of market equilibrium with differentiated products, including applications to various policy-relevant markets. Most real-world markets feature differentiated products, in reputation and service quality if not in explicit product characteristics. Examples include private markets for physically differentiated goods (automobiles) and markets for various media products (newspapers), as well as for partially privatized and highly regulated goods (such as education in many countries). In the course, we consider how data can reveal demand and cost parameters, including recent results on formal identification. We go on to discuss theoretical, practical and computational aspects of estimation. While many of the models condition on the set of products being offered, we also consider models with endogenous products characteristics, such as location, type and quality of the product. Empirical applications feature policy relevant markets like health, media and education, as well as classic applications to antitrust analysis.