Quantifying Uncertainty: Evaluating Trading Algorithms using Probabilistic Programming


Thomas Wiecki


Quantopian Inc.



About me

  • PhD from Brown University -- research on computational neuroscience and machine learning using Bayesian modeling.
  • Quantitative Researcher at Quantopian Inc: Building the world's first algorithmic trading platform in the web browser.
  • @twiecki on Twitter.

Outline

  1. Motivation / Problem: Identify stellar quants with limited data
  2. The naive approach: point estimates
  3. Quick primer on Random Variables and Bayesian statistics
  4. Quantifying uncertainty
  5. Including prior knowledge
  6. Hierarchical modeling

1. Motivation

Live Trading

Crowd-sourced hedge fund

In [2]:
from IPython.display import Image
import numpy as np
import scipy as sp
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_context('talk')
sns.set_context(rc={'lines.markeredgewidth': 0.1})
from matplotlib import rc
#rc('font',**{'family':'sans-serif','sans-serif':['Helvetica'], 'size': 22})
rc('xtick', labelsize=12) 
rc('ytick', labelsize=12)
## for Palatino and other serif fonts use:
#rc('font',**{'family':'serif','serif':['Palatino']})
#rc('text', usetex=True)
#%matplotlib inline
%pylab --no-import-all inline

import pymc as pm
df = pd.read_pickle('live_data.pickle')
Populating the interactive namespace from numpy and matplotlib

/home/wiecki/working/projects/pymc/pymc/distributions/multivariate.py:3: UserWarning: theano modules are deprecated and will be removed in release 0.7
  from theano.sandbox.linalg import det, solve, matrix_inverse, trace

In [3]:
i = 0
hier_df = []
for id, algo_df2 in df.groupby(level=0):
    i += 1
    #if i > 15:
    #    break
    if len(algo_df2) < 50:
        continue
    if id == '5396f34a2cb51f811b00005f':
        continue
    algo_df2 = algo_df2.reset_index(level=0)
    mu = algo_df2.returns_rf.mean()
    sd = algo_df2.returns_rf.std()
    #print id, sd
    idx = (algo_df2.returns_rf < (mu + 3*sd)) & ((algo_df2.returns_rf > (mu - 3*sd)))
    hier_df.append(algo_df2.loc[idx])
hier_df = pd.concat(hier_df)

The naive approach

Use one of the many risk metrics that people have devised, like the Sharpe ratio:

$$\text{Sharpe} = \frac{\text{returns}}{\text{volatility}}$$

In [5]:
sns.distplot(hier_df.groupby('live_algo_id').last().sharpe)
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa45347a5d0>
In [3]:
for algo_id, df_algo in df.groupby(level=0):
    df_algo.reset_index().sharpe.plot(lw=1)
plt.xlabel('days running')
plt.ylabel('Sharpe ratio')
Out[3]:
<matplotlib.text.Text at 0x7f409313eb10>

Look familiar?

2. Short primer on random variables

  • Represents our beliefs about an unknown state.
  • Probability distribution assigns a probability to each possible state.
  • Not a single number (e.g. most likely state).

You already know what a variable is...

In [8]:
coin = 0 # 0 for tails
coin = 1 # 1 for heads

A random variable assigns all possible values a certain probability

In []:
coin = {0: 50%,
        1: 50%}

Alternatively:

coin ~ Bernoulli(p=0.5)

  • coin is a random variable
  • Bernoulli is a probability distribution
  • ~ reads as "is distributed as"

This was discrete (binary), what about the continuous case?

returns ~ Normal($\mu$, $\sigma^2$)

In [30]:
from scipy import stats
sns.distplot(df.benchmark, kde=False, fit=stats.norm)
plt.xlabel('returns')

How to estimate $\mu$ and $\sigma$?

  • Naive: point estimate
  • Set mu = np.mean(data) and sigma = np.std(data)
  • Maximum Likelihood Estimate
  • Correct answer as $n \rightarrow \infty$

Bayesian analysis

  • Most of the time $n \neq \infty$...
  • Uncertainty about $\mu$ and $\sigma$
  • Turn $\mu$ and $\sigma$ into random variables
  • How to estimate?

Bayes Formula!

Use prior knowledge and data to update our beliefs.

In [4]:
figsize(7, 7)
from IPython.html.widgets import interact, interactive
from scipy import stats
def gen_plot(n=0, bayes=False):
    np.random.seed(3)
    x = np.random.randn(n)
    ax1 = plt.subplot(221)
    ax2 = plt.subplot(222)
    ax3 = plt.subplot(223)
    #fig, (ax1, ax2, ax3, _) = plt.subplots(2, 2)
    if n > 1:
        sns.distplot(x, kde=False, ax=ax3, rug=True, hist=False)
    
    def gen_post_mu(x, mu0=0, sigma0=5):
        mu = np.mean(x)
        sigma = 1
        n = len(x)
        
        if n == 0:
            post_mu = mu0
            post_sigma = sigma0
        else:
            post_mu = (mu0 / sigma0**2 + np.sum(x) / sigma0**2) / (1/sigma0**2 + n/sigma**2)
            post_sigma = (1 / sigma0**2 + n / sigma**2)**-1
        return stats.norm(post_mu, post_sigma**2)
    
    def gen_post_var(x, alpha0, beta0):
        mu = 0
        sigma = np.std(x)
        n = len(x)
        post_alpha = alpha0 + n / 2
        post_beta = beta0 + np.sum((x - mu)**2) / 2
        return stats.invgamma(post_alpha, post_beta)
    
    
    mu_lower = -.3
    mu_upper = .3
    
    sigma_lower = 0
    sigma_upper = 3
    ax2.set_xlim((2, n+1))#(sigma_lower, sigma_upper))
    ax1.axvline(0, lw=.5, color='k')
    #ax2.axvline(1, lw=.5, color='k')
    ax2.axhline(0, lw=0.5, color='k')
    if bayes:
        post_mu = gen_post_mu(x, 0, 5)
        #post_var = gen_post_var(x, 1, 1)
        if post_mu.ppf(.05) < mu_lower:
            mu_lower = post_mu.ppf(.01)
        if post_mu.ppf(.95) > mu_upper:
            mu_upper = post_mu.ppf(.99)
        x_mu = np.linspace(mu_lower, mu_upper, 500)
        ax1.plot(x_mu, post_mu.pdf(x_mu))
        #x_sigma = np.linspace(sigma_lower, sigma_upper, 100)
        l = []
        u = []
        for i in range(1, n):
            norm = gen_post_mu(x[:i])
            l.append(norm.ppf(.05))
            u.append(norm.ppf(.95))
        ax2.fill_between(np.arange(2, n+1), l, u, alpha=.3)
        ax1.set_ylabel('belief')
        #ax2.plot(x_sigma, post_var.pdf(x_sigma))
    else:
        mu = np.mean(x)
        sd = np.std(x)
        ax1.axvline(mu)
        ax2.plot(np.arange(2, n+1), [np.mean(x[:i]) for i in range(1, n)])
        #ax2.axvline(sd)
        
    ax1.set_xlim((mu_lower, mu_upper))
    ax1.set_title('current mu estimate')
    ax2.set_title('history of mu estimates')
    
    ax2.set_xlabel('n')
    ax2.set_ylabel('mu')
    ax3.set_title('data (returns)')
    plt.tight_layout()
In [6]:
interactive(gen_plot, n=(0, 600), bayes=True)

Probabilistic Programming

  • Model unknown causes (e.g. $\mu$) of a phenomenon as random variables.
  • Write a programmatic story of how unknown causes result in observable data.
  • Use Bayes formula to invert generative model to infer unknown causes.

Approximating the posterior with MCMC sampling

In [7]:
def plot_want_get():
    from scipy import stats
    fig = plt.figure(figsize=(14, 6))
    ax1 = fig.add_subplot(121, title='What we want', ylim=(0, .5), xlabel='', ylabel='')
    ax1.plot(np.linspace(-4, 4, 100), stats.norm.pdf(np.linspace(-3, 3, 100)), lw=4.)
    ax2 = fig.add_subplot(122, title='What we get')#, xlim=(-4, 4), ylim=(0, 1800), xlabel='', ylabel='\# of samples')
    sns.distplot(np.random.randn(10000), ax=ax2);
    ax2.set_xlim((-4, 4));
    ax2.set_ylim((0, .5));

Approximating the posterior with MCMC sampling

In [8]:
plot_want_get()

PyMC3

  • Probabilistic Programming framework written in Python.
  • Allows for construction of probabilistic models using intuitive syntax.
  • Features advanced MCMC samplers.
  • Fast: Just-in-time compiled by Theano.
  • Extensible: easily incorporates custom MCMC algorithms and unusual probability distributions.
  • Authors: John Salvatier, Chris Fonnesbeck, Thomas Wiecki
  • Still in alpha (lack of documentation)!

Lets select a single algorithm to work with.

In [9]:
algo_df.sharpe.plot()
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa452f51290>

Statistical model

$\mu \sim normal(0, 1)$

$\sigma \sim gamma(0.01, 2)$

$returns \sim normal(\mu, \sigma^2)$

$Sharpe = \frac{\mu}{\sigma}$

In [29]:
x = np.linspace(-4, 4, 500)
y = stats.norm(0, 1).pdf(x)
plt.plot(x, y, lw=5, color='k')
plt.grid('off')
plt.ylim(-.01, 0.45)
plt.savefig('bayes_mu_prior.svg')

plt.figure()
x = np.linspace(-4, 4, 500)
y = stats.norm(0, .4).pdf(x)
plt.plot(x, y, lw=5, color='k')
plt.grid('off')
plt.ylim(-.01, 1.2)
plt.savefig('bayes_mu_post.svg')

plt.figure()
x = np.linspace(0, 8, 500)
y = stats.gamma(3, 0).pdf(x)
plt.plot(x, y, lw=5, color='k')
plt.grid('off')
#plt.ylim(-.01, 0.45)
plt.savefig('bayes_sigma_prior.svg')

plt.figure()
x = np.linspace(0, 8, 500)
y = stats.gamma(2, 0).pdf(x)
plt.plot(x, y, lw=5, color='k')
plt.grid('off')
#plt.ylim(-.01, 1.2)
plt.savefig('bayes_sigma_post.svg')

Graphical model of returns

In [38]:
returns_data = df.benchmark.values

This is what the data looks like

In [39]:
print returns_data
[ 0.00021801  0.00065388  0.00272272 ..., -0.00572433  0.00782789
 -0.0161355 ]

In [14]:
import pymc as pm

with pm.Model() as model:
    # Priors on Random Variables
    mean_return = pm.Normal('mean return', mu=0, sd=1)
    volatility = pm.Gamma('volatility', mu=.01, sd=2)

    # Model returns as Normal
    obs = pm.Normal('returns', 
                    mu=mean_return, 
                    sd=volatility,
                    observed=returns_data)
    
    sharpe = pm.Deterministic('sharpe ratio', 
                              (mean_return / volatility) * np.sqrt(252) )
In [15]:
with model:
    # Instantiate MCMC sampler
    step = pm.NUTS()
    # Draw 500 samples from the posterior
    trace = pm.sample(500, step)
 [-----------------100%-----------------] 500 of 500 complete in 0.4 sec
In [13]:
sns.tsplot(b_sharpe, err_style='unit_points', interpolate=True)
plt.xlabel('# trading days')
plt.ylabel('Bayesian Sharpe ratio');
In [30]:
print 'P(Sharpe Ratio > 0) = %.2f%%' % \
    (np.mean(trace['sharpe ratio'] > 0) * 100)
P(Sharpe Ratio > 0) = 95.40%

In [16]:
pm.traceplot(trace);
In [12]:
def bayesian_sharpe(algo_df):
    with pm.Model():
        # Empirical prior based on benchmark
        mu_returns = pm.Normal('mu_returns', mu=0, sd=1)
        sd_returns = pm.Gamma('sd_returns', mu=.01, sd=2) #pm.Uniform('sd_returns', lower=0, upper=10, testval=std_bmark_returns)
        # Model returns as Normal
        returns = pm.Normal('returns', mu=mu_returns, sd=sd_returns, observed=algo_df.returns_rf)
    
        sharpe = pm.Deterministic('sharpe ratio', (mu_returns / sd_returns) * np.sqrt(252) )
        step = pm.NUTS()
        trace = pm.sample(500, step)
        
    return trace['sharpe ratio']

b_sharpe = []
for i in range(3, len(algo_df)):
    b_sharpe.append(bayesian_sharpe(algo_df.iloc[:i]))

b_sharpe = np.asarray(b_sharpe).T
 [-----------------100%-----------------] 500 of 500 complete in 0.5 sec
/home/wiecki/envs/zipline_p14/local/lib/python2.7/site-packages/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  from scan_perform.scan_perform import *

Prior knowledge

In [17]:
bmark_mean = df.bmark_rf.mean()
bmark_sd = df.bmark_rf.std()

with pm.Model() as model:
    # Priors on Random Variables
    mean_return = pm.Normal('mean return', mu=bmark_mean, sd=.001)
    volatility = pm.Gamma('volatility', mu=bmark_sd, sd=.001)

    # Model returns as Normal
    returns = pm.Normal('returns', 
                        mu=mean_return, 
                        sd=volatility,
                        observed=returns_data)
    
    sharpe = pm.Deterministic('sharpe ratio', 
                              (mean_return / volatility) * np.sqrt(252) )
In [18]:
def bayesian_sharpe_info_prior(algo_df, mean, sd):
    with pm.Model():
        mean_returns = pm.Normal('mean_returns', mu=mean, sd=.001)
        volatility = pm.Gamma('volatility', mu=sd, sd=.001) #pm.Uniform('sd_returns', lower=0, upper=10, testval=std_bmark_returns)
        # Model returns as Normal
        obs = pm.Normal('observed returns', mu=mean_returns, sd=volatility, observed=algo_df.returns_rf)
    
        sharpe = pm.Deterministic('sharpe ratio', (mean_returns / volatility) * np.sqrt(252) )
        start = pm.find_MAP()
        step = pm.Slice()
        trace = pm.sample(1000, step, start=start)
        
    return trace['sharpe ratio'][:500]

bmark_mean = df.bmark_rf.mean()
bmark_sd = df.bmark_rf.std()
#algo_df = df_risk.reset_index(level=0).loc[df_risk.reset_index(level=0).live_algo_id == '533c31029d73e63210000191']
b_sharpe_info = []
for i in range(3, len(algo_df)):
    b_sharpe_info.append(bayesian_sharpe_info_prior(algo_df.iloc[:i], bmark_mean, bmark_sd))

b_sharpe_info = np.asarray(b_sharpe_info).T
 [-----------------100%-----------------] 1000 of 1000 complete in 1.0 sec
In [19]:
sns.tsplot(b_sharpe_info, err_style='unit_points', interpolate=True)
plt.xlabel('# trading days')
plt.ylabel('Sharpe ratio');

Analysing multiple algorithms

Easy! Compute Bayesian Sharpe ratio for each algorithm.

Unpooled

unpooled

In [116]:
with pm.Model() as model_unpooled:
    algo_ret = pm.Normal('algo_ret', mu=bmark_mean, sd=.5, 
                         shape=n_algos)
    algo_vol = pm.Gamma('algo_vol', mu=bmark_sd, sd=.01, 
                        shape=n_algos)
    
    # Model returns as Normal
    obs = pm.Normal('observed returns', 
                    mu=algo_ret[algo_idx], 
                    sd=algo_vol[algo_idx], 
                    observed=returns_all.values)
    
    sharpe = pm.Deterministic('sharpe ratio', (algo_ret / algo_vol) * np.sqrt(252))
In [121]:
with model_unpooled:
    start = pm.find_MAP()
    step = pm.Metropolis()#NUTS(scaling=start)
    trace_unpooled = pm.sample(20000, step)[-2000:]
 [-----------------100%-----------------] 20000 of 20000 complete in 23.2 sec
In [122]:
pm.traceplot(trace_unpooled);
In [102]:
pm.forestplot(trace_unpooled, vars=['sharpe ratio']);