## Saturday, December 1, 2012

### Characteristic Functions and scipy.stats

scipy.stats.distributions is among other things a nice formula collection.

One of the parts that are missing are the characteristic functions for the distributions.

Wikipedia is very good for large areas of statistics, see for some details and examples http://en.wikipedia.org/wiki/Characteristic_function_%28probability_theory%29 Wikipedia lists the characteristic funtion also on the pages for many distributions.

I wrote something like the script below already several times (for the easy cases).

The characteristic function for the normal distribution is easy, but looking at the characteristic function of the t-distribution, I wish someone had translated it into code already.

#### t-distribution

Since I haven't seen it yet, I sat down and tried it myself. I managed to code the characteristic function of the t-distribution, but it returns NaNs when it is evaluated close to zero for large df.

I didn't find a Bessel "k" function that works in this case

```>>> special.kv(50/2., 1e-30)
inf
>>> special.kve(50/2., 1e-30)
inf
```

The t-distribution approaches the normal distribution as the shape parameter, the degrees of freedom, gets large. So, the characteristic function of the t-distribution should be well behaved for large df. However, the individual terms go to infinity or zero.

Since in my current case, I don't care about the behavior of the characteristic function around zero, I stopped trying to get a better implementation.

Warning: monkey patching included in the script

Aside: I cannot make up my mind whether the abbreviation for characteristic function should be chf or cf. I have both versions in various scripts that I wrote.

#### The script

Here's my current script

```# -*- coding: utf-8 -*-
"""Characteristic Functions

Created on Fri Nov 30 22:43:36 2012
Author: Josef Perktold
"""

import numpy as np
from scipy import stats, special

def chf_normal(t, loc=0, scale=1):
'''characteristic function of normal distribution

Parameters
----------
t : array_like
points at which characteristic function is evaluated
loc : float (or array_like ?)
mean of underlying normal distribution
scale : float (or array_like ?)
standard deviation, scale of normal distribution

Returns
-------
chfval : ndarray
characteristic function evaluated at x

Notes
-----
Can be used for higher dimensional arguments if ``t``, ``mean`` and ``var``

'''
t = np.asarray(t)
return np.exp(1j * t * loc - 0.5 * t**2 * scale**2)

def chf_t(t, df, loc=0, scale=1):
'''characteristic function of t distribution

breaks down for large df and t close to zero
'''
t = np.asarray(t)
vhalf = df / 2.
term = np.sqrt(df) * np.abs(t*scale)
cf = special.kv(vhalf, term) * np.power(term, vhalf)
cf = cf / special.gamma(vhalf) / 2**(vhalf - 1)
cf = cf * np.exp(1j * loc * t)
if cf.shape == () and t == 0:
#special case: kv(0) returns nan
#for df>15 or so, we also get nans in the result in neighborhood of 0
cf = np.ones((), cf.dtype)  #return same dtype as cf would
else:
cf[t == 0] = 1

return cf

def chf_t_(t, df, loc=0, scale=1):
#not much, but a bit better with log
vhalf = df / 2.
term = np.sqrt(df) * np.abs(t*scale)
cf = np.log(special.kv(vhalf, term)) + vhalf * np.log(term)
cf = cf - special.gammaln(vhalf) - (vhalf - 1) * np.log(2)
cf = cf + 1j * loc * t
if cf.shape == () and t == 0:
#special case: kv(0) returns nan
#for df>15 or so, we also get nans in the result in neighborhood of 0
cf = np.zeros((), cf.dtype)  #return same dtype as cf would
else:
cf[t == 0] = 0
return np.exp(cf)

def chfn(self, t, *args, **kwds):
return chf_normal(t, *args, **kwds)

#monkeypatch scipy.stats
stats.distributions.norm_gen._chf = chfn

t = np.linspace(-1, 1, 11)
print stats.norm._chf(t, loc=1, scale=2)
print chf_t(t, 50, loc=1, scale=2)
print chf_t_(t, 50, loc=1, scale=2)
```

Editorial note: I had written this initially for the scipy-user mailing list. (Old habits are difficult to break.) But I remembered just before hitting Send that the recommendation is to put it in a blog.

#### Application and Plot: Wrapped Circular T Distribution

As explained in my previous post, once we have the characteristic function of a distribution defined on the real line, it is simple to get the Fourier approximation for the wrapped circular distribution. As an application of the characteristic function of the t distribution, I constructed the wrapped circular distributions.

The following plot shows an example of the density functions of the wrapped Cauchy, the wrapped normal distribution, and the wrapped t distribution for a few values of the degrees of freedom. Normal and Cauchy distributions are the two extreme cases of the t distribution, when the degrees of freedom go to infinity and when the degrees of freedom is one, respectively.

The distribution in the plot have the same location and scale parameter. However, this implies that the variance of the distributions is not the same. As a different comparison we could have adjusted the scale parameter to obtain distributions with identical variance. The latter is a more informative comparison when we are estimating the parameters based on data, and the estimated distribution reflects a similar variance as the data.

The fatter tails of Cauchy and t distributions with small t are clearly visible in the plot.

1. Cool stuff. This post inspired me to try this in sympy.stats.

The t-distribution gives a pretty scary result. I'm not competent enough in statistics to validate the results. Does this look decent?

https://gist.github.com/4186709

The Cauchy distribution was nicer.

https://gist.github.com/4186685

It felt nice to write
>>> E(exp(I*t*X))
and get something meaningful

2. Thanks, I didn't know sympy has now a larger collection of distributions.
This looks good, but I don't think it would solve my problem in the neighborhood of zero for the t-distribution.

Checking Wikipedia http://en.wikipedia.org/wiki/Bessel_function#Modified_Bessel_functions_:_I.CE.B1.2C_K.CE.B1 we find the relationship between Bessel kv and iv. But a quick check with scipy.special.iv for my example numbers shows that your expression should be 0/0 at t=0 and run into similar numerical problems in the neighborhood of zero.

(I'm not an expert, just a user of special functions, so my quick reading might not be correct.)

I forgot to add in my post above:
Numerical integration with scipy.quad has no problems with calculating the characteristic function in the neighborhood of zero for degrees of freedom where the characteristic function of the t distribution agrees with the normal distribution at more than 10 decimals (df=1000 or 10000).

In the expression that sympy gives you, exp(-x) is translated into cosh and sinh terms, see last line in this section http://en.wikipedia.org/wiki/Hyperbolic_function#Useful_relations

Staying with exp is a bit easier in my eyes
http://en.wikipedia.org/wiki/Cauchy_distribution#Characteristic_function and shows more closely the relationship to stable and similarity and differences to normal distribution.

As aside:
Last time I tried sympy for this, I was looking for the derivative of the (log)pdf of the t distribution with respect to the parameters, and I was very positively surprised when it gave me the name of the special function that is the derivative of the gamma function (which I had never heard of before).

3. It looks like Blogger doesn't render links in comments. I didn't find any option to enable it.

4. > But a quick check with scipy.special.iv for my example numbers shows that your expression should be 0/0 at t=0 and run into similar numerical problems in the neighborhood of zero

SymPy may know how to handle 0/0 better than Python does. We also get to cheat by using high precision math when we need to.

In [41]: X = StudentT("x", 51)
In [42]: f = simplify(E(exp(I*t*X)))
In [44]: simplify(f.subs(t, .0000000001)).evalf(30)
Out[44]: 0.999999999999999999745000000121

In [48]: simplify(f.subs(t, .000000001))
Out[48]:
⎛ ____⎞ ____ ⎛ ____⎞ ____
- 1.0⋅sinh⎝1.0e-9⋅╲╱ 51 ⎠ - 1.0e-9⋅╲╱ 51 ⋅sinh⎝1.0e-9⋅╲╱ 51 ⎠ + 1.0e-9⋅╲╱ 51 ⋅

⎛ ____⎞ ⎛ ____⎞
cosh⎝1.0e-9⋅╲╱ 51 ⎠ + 1.0⋅cosh⎝1.0e-9⋅╲╱ 51 ⎠

5. Regarding exp(-x) vs sinh/cosh yes, I agree. There are lots of cases like this where Sympy makes a decision about what is simplest and that decision that can be disputed. It's hard to design a simple system that makes everything look good for a variety of cases.

I'm curious, what do you actually use characteristic functions for? We've thought about adding these more naturally into SymPy but I couldn't find a motivating use case.

6. some uses for characteristic functions
http://en.wikipedia.org/wiki/Characteristic_function_%28probability_theory%29#Uses

For sympy some of the theoretical usages would be interesting, convolution, limit theorems, moments from derivatives.

I use it currently to get the circular moments of distributions that are wrapped around the unit circle, x mod(2pi) where x is normal, Cauchy or t distributed.

Other usecases I looked at but haven't much coded yet, are in either estimation (for example the stable distribution doesn't have an explicit pdf but has an explicit characteristic function) and in goodness of fit tests.

7. I collected my examples into a matching blogpost.

http://matthewrocklin.com/blog/work/2012/12/03/Characteristic-Functions/

8. Thank Matthew, your examples are very useful to see what sympy can do now, and how to do it.

After your previous comments I read partially through the source of the new sympy.stats (instead of sympy.statistics that I had looked at before.) It's a nice collection of distributions that will come in handy for some derivations of distributions.

The main difference between sympy.stats and scipy.stats/statsmodels is that we mainly care about the numerical results and we try to get as much vectorized as possible, so we have more speed (and less precision) when we have to evaluate a function a few thousand times.