Friday, February 10, 2012

Starting a list of diagnostic and specification tests

I started already several times an inventory of statistical tests in python, scipy.stats and statsmodels, compared to R.

Here is another try.

It is mainly the comparison with the R package lmtest. I just spend most of two days, writing tests against R, and before that some days of writing tests against Gretl, and before that outlier measures against SAS, as described in the previous posts.

I don't know yet how easy it will be to maintain a list like this, the current version is mainly based on parsing the lmtest index page. lmtest is not very complete, and there are tests covered in other packages and additional tests covered in Gretl.

For now I just keep it in a boring python module, with a string that's easy to manipulate and convert.

#cols: category, name, statsmodels, r_lmtest, gretl
s4 = '''\
acorr | Breusch-Godfrey Test | acorr_breush_godfrey | bgtest
acorr | Durbin-Watson Test | location_no_pvalues | dwtest
het | Breusch-Pagan Test | het_breush_pagan | bptest
het | Goldfeld-Quandt Test | het_goldfeld_quandt | gqtest
het | Harrison-McCabe test | missing | hmctest
het | White test | het_white | - |
causality | Test for Granger Causality | grangercausalitytest | grangertest
linear | Harvey-Collier Test | missing | harvtest
linear | PE Test for Linear vs. Log-Linear Specifications | missing | petest
linear | Rainbow Test | missing | raintest
func form | RESET Test | with outliers | resettest
compare nonnested | Cox Test | compare_cox | coxtest
compare nonnested | J Test | compare_j | jtest
compare nonnested | Encompassing Test | missing | encomptest
compare nested | Likelihood Ratio Test nested | compare_lr | lrtest
compare nested | Wald Test nested | compare_ftest | waldtest
coef | Testing Estimated Coefficients | t_test | coeftest
coef | Testing Estimated Coefficients | missing | coeftest.breakpointsfull
'''

add another separator
print '\n'.join(line + '|' for line in s4.split('\n'))
convert to list of list 
def str2list(ss, sep='|', keep_empty=4):
Unfortunately copying into blogger doesn't preserve intend, so skip this. And convert separator to tabs, so that google spreadsheet separates the cells:
print '\n'.join(line + '|' for line in s4.split('\n'))

Tomorrow, I will start looking for the diagnostic tests that are not yet on the list. R stats has some, for example (fm is my test case linear model result)

> names(ls.diag(fm))
 [1] "std.dev"      "hat"          "std.res"      "stud.res"     "cooks"        "dfits"      
 [7] "correlation"  "std.err"      "cov.scaled"   "cov.unscaled"
I figured out that json works pretty well transferring data from some R animals to python.

In other news:
The basic R doesn't save automatically the sessionlog or history. I was playing with Rcommander last night to see what default diagnostics they are proposing, and it crashed R after a while. Unfortunately, I hadn't saved my sessionlog and script file, so a day or two of work died with it. I didn't think about safeguarding against crashes anymore, Windows never crashes, not even the kids manage to turn it off anymore, spyder and firefox always recover after a crash with an existing history or session log. R never crashed before.


No comments:

Post a Comment