# Probabilistic Programming in Quantitative Finance

### @twiecki

• Lead Data Scientist at Quantopian Inc: Building a crowd sourced hedge fund.
• PhD from Brown University -- research on computational neuroscience and machine learning using Bayesian modeling.

# The problem we're gonna solve¶

Two real-money strategies:

In [76]:
plot_strats()


# Sharpe Ratio¶

$$\text{Sharpe} = \frac{\text{mean returns}}{\text{volatility}}$$
In [24]:
print "Sharpe ratio strategy etrade =", data_0.mean() / data_0.std() * np.sqrt(252)
print "Sharpe ratio strategy IB =", data_1.mean() / data_1.std() * np.sqrt(252)

Sharpe ratio strategy etrade = 0.627893606355
Sharpe ratio strategy IB = 1.43720181575


# Short primer on random variables¶

• Represents our beliefs about an unknown state.
• Probability distribution assigns a probability to each possible state.
• Not a single number (e.g. most likely state).

• "When I bet on horses, I never lose. Why? I bet on all the horses." Tom Haverford

## You already know what a variable is...¶

In [8]:
coin = 0 # 0 for tails
coin = 1 # 1 for heads


## A random variable assigns all possible values a certain probability¶

In [ ]:
coin = {0: 50%,
1: 50%}


### Alternatively:¶

coin ~ Bernoulli(p=0.5)

• coin is a random variable
• Bernoulli is a probability distribution
• ~ reads as "is distributed as"

## This was discrete (binary), what about the continuous case?¶

returns ~ Normal($\mu$, $\sigma^2$)

In [77]:
from scipy import stats
sns.distplot(data_0, kde=False, fit=stats.norm)
plt.xlabel('returns')

Out[77]:
<matplotlib.text.Text at 0x7fde80d68110>

# How to estimate $\mu$ and $\sigma$?¶

• Naive: point estimate
• Set mu = mean(data) and sigma = std(data)
• Maximum Likelihood Estimate
• Correct answer as $n \rightarrow \infty$

# Bayesian analysis¶

• Most of the time $n \neq \infty$...
• Uncertainty about $\mu$ and $\sigma$
• Turn $\mu$ and $\sigma$ into random variables
• How to estimate?

# Bayes Formula!¶

Use prior knowledge and data to update our beliefs.

In [78]:
interactive(gen_plot, n=(0, 600), bayes=True)

Out[78]:

# Probabilistic Programming¶

• Model unknown causes (e.g. $\mu$) of a phenomenon as random variables.
• Write a programmatic story of how unknown causes result in observable data.
• Use Bayes formula to invert generative model to infer unknown causes.

## Approximating the posterior with MCMC sampling¶

In [81]:
plot_want_get()


# PyMC3¶

• Probabilistic Programming framework written in Python.
• Allows for construction of probabilistic models using intuitive syntax.
• Features advanced MCMC samplers.
• Fast: Just-in-time compiled by Theano.
• Extensible: easily incorporates custom MCMC algorithms and unusual probability distributions.
• Authors: John Salvatier, Chris Fonnesbeck, Thomas Wiecki
• Upcoming beta release!

# Model returns distribution: Specifying our priors¶

In [82]:
x = np.linspace(-.3, .3, 500)
plt.plot(x, T.exp(pm.Normal.dist(mu=0, sd=.1).logp(x)).eval())
plt.title(u'Prior: mu ~ Normal(0, $.1^2$)'); plt.xlabel('mu'); plt.ylabel('Probability Density'); plt.xlim((-.3, .3));

In [83]:
x = np.linspace(-.1, .5, 500)
plt.plot(x, T.exp(pm.HalfNormal.dist(sd=.1).logp(x)).eval())
plt.title(u'Prior: sigma ~ HalfNormal($.1^2$)'); plt.xlabel('sigma'); plt.ylabel('Probability Density');


## Bayesian Sharpe ratio¶

$\mu \sim \text{Normal}(0, .1^2)$ $\leftarrow \text{Prior}$

$\sigma \sim \text{HalfNormal}(.1^2)$ $\leftarrow \text{Prior}$

$\text{returns} \sim \text{Normal}(\mu, \sigma^2)$ $\leftarrow \text{Observed!}$

$\text{Sharpe} = \frac{\mu}{\sigma}$

## This is what the data looks like¶

In [9]:
print data_0.head()

2013-12-31 21:00:00    0.002143
2014-01-02 21:00:00   -0.028532
2014-01-03 21:00:00   -0.001577
2014-01-06 21:00:00   -0.000531
2014-01-07 21:00:00    0.011310
Name: 0, dtype: float64

In [14]:
import pymc as pm

with pm.Model() as model:
# Priors on Random Variables
mean_return = pm.Normal('mean return', mu=0, sd=.1)
volatility = pm.HalfNormal('volatility', sd=.1)

# Model returns as Normal
obs = pm.Normal('returns',
mu=mean_return,
sd=volatility,
observed=data_0)

sharpe = pm.Deterministic('sharpe ratio',
mean_return / volatility * np.sqrt(252))

In [15]:
with model:
# Instantiate MCMC sampler
step = pm.NUTS()
# Draw 500 samples from the posterior
trace = pm.sample(500, step)

 [-----------------100%-----------------] 500 of 500 complete in 0.4 sec

# Analyzing the posterior¶

In [84]:
sns.distplot(results_normal[0][0]['mean returns'], hist=False, label='etrade')
sns.distplot(results_normal[1][0]['mean returns'], hist=False, label='IB')
plt.title('Posterior of the mean'); plt.xlabel('mean returns')

Out[84]:
<matplotlib.text.Text at 0x7fde80cb5850>
In [85]:
sns.distplot(results_normal[0][0]['volatility'], hist=False, label='etrade')
sns.distplot(results_normal[1][0]['volatility'], hist=False, label='IB')
plt.title('Posterior of the volatility')
plt.xlabel('volatility')

Out[85]:
<matplotlib.text.Text at 0x7fde80e58310>
In [86]:
sns.distplot(results_normal[0][0]['sharpe'], hist=False, label='etrade')
sns.distplot(results_normal[1][0]['sharpe'], hist=False, label='IB')
plt.title('Bayesian Sharpe ratio'); plt.xlabel('Sharpe ratio');

In [28]:
print 'P(Sharpe ratio IB > 0) = %.2f%%' % \
(np.mean(results_normal[1][0]['sharpe'] > 0) * 100)

P(Sharpe ratio IB > 0) = 96.48%

In [29]:
print 'P(Sharpe ratio IB > Sharpe ratio etrade) = %.2f%%' % \
(np.mean(results_normal[1][0]['sharpe'] > results_normal[0][0]['sharpe']) * 100)

P(Sharpe ratio IB > Sharpe ratio etrade) = 80.06%


# Value at Risk with uncertainty¶

In [88]:
ppc_etrade = post_pred(var_cov_var_normal, results_normal[0][0], 1e6, .05, samples=800)
ppc_ib = post_pred(var_cov_var_normal, results_normal[1][0], 1e6, .05, samples=800)
sns.distplot(ppc_ib, label='IB', norm_hist=True, hist=False, color='g')
plt.title('VaR'); plt.legend(loc=0); plt.xlabel('5% daily Value at Risk (VaR) with \$1MM capital (in \$)'); plt.ylabel('Probability density'); plt.xticks(rotation=15);


# Interim summary¶

• Bayesian stats allows us to reformulate common risk metrics, use priors and quantify uncertainty.
• IB strategy seems better in almost every regard. Is it though?

# So far, only added confidence¶

In [89]:
sns.distplot(results_normal[0][0]['sharpe'], hist=False, label='etrade')
sns.distplot(results_normal[1][0]['sharpe'], hist=False, label='IB')
plt.title('Bayesian Sharpe ratio'); plt.xlabel('Sharpe ratio');
plt.axvline(data_0.mean() / data_0.std() * np.sqrt(252), color='b');
plt.axvline(data_1.mean() / data_1.std() * np.sqrt(252), color='g');


# Is this a good model?¶

In [93]:
sns.distplot(data_1, label='data IB', kde=False, norm_hist=True, color='.5')
for p in ppc_dist_normal:
plt.plot(x, p, c='r', alpha=.1)
plt.plot(x, p, c='r', alpha=.5, label='Normal model')
plt.xlabel('Daily returns')
plt.legend();


# Can it be improved? Yes!¶

Identical model as before, but instead, use a heavy-tailed T distribution:

$\text{returns} \sim \text{T}(\nu, \mu, \sigma^2)$

In [94]:
sns.distplot(data_1, label='data IB', kde=False, norm_hist=True, color='.5')
for p in ppc_dist_t:
plt.plot(x, p, c='y', alpha=.1)
plt.plot(x, p, c='y', alpha=.5, label='T model')
plt.xlabel('Daily returns')
plt.legend();


# Mean returns¶

In [96]:
sns.distplot(results_normal[1][0]['mean returns'], hist=False, color='r', label='normal model')
sns.distplot(results_t[1][0]['mean returns'], hist=False, color='y', label='T model')
plt.xlabel('Posterior of the mean returns'); plt.ylabel('Probability Density');


# Bayesian T-Sharpe ratio¶

In [97]:
sns.distplot(results_normal[1][0]['sharpe'], hist=False, color='r', label='normal model')
sns.distplot(results_t[1][0]['sharpe'], hist=False, color='y', label='T model')
plt.xlabel('Bayesian Sharpe ratio'); plt.ylabel('Probability Density');


# But why? T distribution is more robust!¶

In [98]:
sim_data = list(np.random.randn(75)*.01)
sim_data.append(-.2)
sns.distplot(sim_data, label='data', kde=False, norm_hist=True, color='.5'); sns.distplot(sim_data, label='Normal', fit=stats.norm, kde=False, hist=False, fit_kws={'color': 'r', 'label': 'Normal'}); sns.distplot(sim_data, fit=stats.t, kde=False, hist=False, fit_kws={'color': 'y', 'label': 'T'})
plt.xlabel('Daily returns'); plt.legend();