Tuesday, September 14, 2010

Statsmodels - Ordinary Least Squares

Here is a simple example of performing ordinary least squares using Scikits Statsmodels. Extra column of '1's in X matrix was added to determine the intercept of the regression line, sm.add_constant() call could also be used to add this extra column.
import numpy as np
import scikits.statsmodels as sm

y = [11,14,19,26]

X = [[1,1],[2,1],[3,1],[4,1]]

olsmod = sm.OLS(y, X)
olsres = olsmod.fit()

print olsres.params
print olsres.bse

# predict using new data points
ypred = olsmod.predict([[5,1],[6,1]])
print ypred

Friday, September 10, 2010

GLM and Python, scikits.statsmodels

Python version of the code for ARM 6.2 uses GLM() call under the scikits.statsmodels package. AFAIK there is no equivalent call for R factor() in this package. Also, in order to get the intercept as in R, we need to add an extra column of '1's. Other than that, modified R calls (withouts factors) and Python GLM calls match exactly.
import numpy
import scikits.statsmodels as sm
from scipy import stats

data = numpy.loadtxt("frisk_with_noise.dat", skiprows=7)

X = numpy.zeros((3,len(data[:,0])))
print X.shape

arrests = data[:,2]
arrests[arrests == 0] = 1
arrests = numpy.log(arrests)

stops = data[:,0]
stops[stops==0.0] = .0001

X[0,:] = arrests # arrests
X[1,:] = data[:,4] # eth
X[2,:] = numpy.ones(len(data[:,0])) # eth

glm = sm.GLM(stops, X.T, family=sm.family.Poisson())
res = glm.fit()
Google Groups

Thursday, August 26, 2010

GCSR - Example 3.7

Attached is the Python source code for GCSR Example 3.7, the bioassay problem. The code is based on Helle Sorensen's code at KU. It samples from alpha, beta posteriors using grid approximation and displays a contour plot, and then, samples from the joint distribution and displays an X-Y plot of the sampled values. The plots duplicate Figure 3.4 (a) and (b) from GCSR.

Code

Wednesday, August 18, 2010

GCSR

I will publish sample code (Python), and solutions for selected questions found in Andrew Gelman's excellent book Bayesian Data Analysis 2nd Edition (GCSR). Each posting will be solution for one question, if required, will include mathematical formulas (using mathurl.com), and Python, R code. R code will be copy and paste from Gelman's own solutions PDF. We might also include code from Guilherme Rocha (another Bayesian Statistics lecturer), or from Jeff Hart (Texas A&M), or from Brian Junker (Carnegie Mellon University).

To install R on ubuntu, apt-get install r-base and r-base-dev.

You can run the R code from command line using "R -f [file]". If there are plots displayed in the R code, they will be dumped into a file called Rplots.pdf in your current directory.

Github