(c) 2017 by Thomas Wiecki
Hierarchical models are underappreciated. Hierarchies exist in many data sets and modeling them appropriately adds a boat load of statistical power (the common metric of statistical power). I provided an introduction to hierarchical models in a previous blog post: Best Of Both Worlds: Hierarchical Linear Regression in PyMC3", written with Danne Elbers. See also my interview with FastForwardLabs where I touch on these points.
Here I want to focus on a common but subtle problem when trying to estimate these models and how to solve it with a simple trick. Although I was somewhat aware of this trick for quite some time it just recently clicked for me. We will use the same hierarchical linear regression model on the Radon data set from the previous blog post, so if you are not familiar, I recommend to start there.
I will then use the intuitions we've built up to highlight a subtle point about expectations vs modes (i.e. the MAP). Several talks by Michael Betancourt have really expanded my thinking here.
%matplotlib inline import matplotlib.pyplot as plt import numpy as np import pymc3 as pm import pandas as pd import theano import seaborn as sns sns.set_style('whitegrid') np.random.seed(123) data = pd.read_csv('../data/radon.csv') data['log_radon'] = data['log_radon'].astype(theano.config.floatX) county_names = data.county.unique() county_idx = data.county_code.values n_counties = len(data.county.unique())
The intuitive specification¶
Usually, hierachical models are specified in a centered way. In a regression model, individual slopes would be centered around a group mean with a certain group variance, which controls the shrinkage:
with pm.Model() as hierarchical_model_centered: # Hyperpriors for group nodes mu_a = pm.Normal('mu_a', mu=0., sd=100**2) sigma_a = pm.HalfCauchy('sigma_a', 5) mu_b = pm.Normal('mu_b', mu=0., sd=100**2) sigma_b = pm.HalfCauchy('sigma_b', 5) # Intercept for each county, distributed around group mean mu_a # Above we just set mu and sd to a fixed value while here we # plug in a common group distribution for all a and b (which are # vectors of length n_counties). a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=n_counties) # Intercept for each county, distributed around group mean mu_a b = pm.Normal('b', mu=mu_b, sd=sigma_b, shape=n_counties) # Model error eps = pm.HalfCauchy('eps', 5) # Linear regression radon_est = a[county_idx] + b[county_idx] * data.floor.values # Data likelihood radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)
# Inference button (TM)! with hierarchical_model_centered: hierarchical_centered_trace = pm.sample(draws=5000, tune=1000, njobs=4)[1000:]
Auto-assigning NUTS sampler... Initializing NUTS using advi... Average ELBO = -1,090.4: 100%|██████████| 200000/200000 [00:37<00:00, 5322.37it/s] Finished [100%]: Average ELBO = -1,090.4 100%|██████████| 5000/5000 [01:15<00:00, 65.96it/s]