Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Saturday, October 1, 2011

Optical Flow, Lucas Kanade in Python

Following is the Lucas Kanade optical flow algorithm in Python. We used it successfully on two png images, as well as through OpenCV to follow a point in successive frames. More details are at Github.
import numpy as np
import scipy.signal as si
from PIL import Image

def gauss_kern():
   h1 = 15
   h2 = 15
   x, y = np.mgrid[0:h2, 0:h1]
   x = x-h2/2
   y = y-h1/2
   sigma = 1.5
   g = np.exp( -( x**2 + y**2 ) / (2*sigma**2) );
   return g / g.sum()

def deriv(im1, im2):
   g = gauss_kern()
   Img_smooth = si.convolve(im1,g,mode='same')
   fx,fy=np.gradient(Img_smooth)  
   ft = si.convolve2d(im1, 0.25 * np.ones((2,2))) + \
       si.convolve2d(im2, -0.25 * np.ones((2,2)))
                 
   fx = fx[0:fx.shape[0]-1, 0:fx.shape[1]-1]  
   fy = fy[0:fy.shape[0]-1, 0:fy.shape[1]-1];
   ft = ft[0:ft.shape[0]-1, 0:ft.shape[1]-1];
   return fx, fy, ft

import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as si
from PIL import Image
import deriv
import numpy.linalg as lin

def lk(im1, im2, i, j, window_size) :
   fx, fy, ft = deriv.deriv(im1, im2)
   halfWindow = np.floor(window_size/2)
   curFx = fx[i-halfWindow-1:i+halfWindow,
              j-halfWindow-1:j+halfWindow]
   curFy = fy[i-halfWindow-1:i+halfWindow,
              j-halfWindow-1:j+halfWindow]
   curFt = ft[i-halfWindow-1:i+halfWindow,
              j-halfWindow-1:j+halfWindow]
   curFx = curFx.T
   curFy = curFy.T
   curFt = curFt.T

   curFx = curFx.flatten(order='F')
   curFy = curFy.flatten(order='F')
   curFt = -curFt.flatten(order='F')
 
   A = np.vstack((curFx, curFy)).T
   U = np.dot(np.dot(lin.pinv(np.dot(A.T,A)),A.T),curFt)
   return U[0], U[1]

Monday, August 22, 2011

Plotting a Complex Exponential

We rewrote one of the MIT OCW 18.03 ODE Mathlets in Python. This mathlet was for plotting complex exponentials.
from pylab import *

from matplotlib.widgets import Slider

ax = subplot(121)
subplots_adjust(left=0.1, bottom=0.25)
l1, = plot(None,None, lw=2, color='red')
axis([-1, 1, -8, 8])
title ('$(a + bi)t$', color='blue')
grid()

ax = subplot(122)
subplots_adjust(left=0.1, bottom=0.25)
l2, = plot(None,None, lw=2, color='red')
axis([-3, 3, -3, 3])
title ('$e^{(a + bi)t}$', color='blue')
grid()

axcolor = 'lightgoldenrodyellow'
axa = axes([0.15, 0.1, 0.65, 0.03], axisbg=axcolor)
axb = axes([0.15, 0.15, 0.65, 0.03], axisbg=axcolor)

slidera = Slider(axa, 'a', -1.0, 1.0, valinit=0)
sliderb = Slider(axb, 'b', -8.0, 8.0, valinit=0)

def update(val):
a = slidera.val
b = sliderb.val
t = arange(-1.0, 1.0, 0.001)
l1.set_xdata(t)
l1.set_ydata((b/a)*t)

t = arange(-3.0, 3.0, 0.001)
l2.set_xdata(exp(a*t)*cos(b*t))
l2.set_ydata(exp(a*t)*sin(b*t))
draw()

slidera.on_changed(update)
sliderb.on_changed(update)

show()


Saturday, July 23, 2011

Clustering, Image Segmentation, Eigenvectors and Python

Here is example code for eigenvector based segmentation in Python. For more details, see code here.
import matplotlib.pyplot as plt
import numpy as np

Img = plt.imread("twoObj.jpg")
n = Img.shape[0]
Img2 = Img.flatten(order='C')
nn = Img2.shape[0]

A = np.zeros((nn,nn))

for i in range(nn):
 for j in range(nn):
     A[i,j]=np.exp(-((Img2[i]-Img2[j])**2))

V,D = np.linalg.eig(A)
V = np.real(V)
a = np.real(D[0])
print a

threshold = 0 # filter
a = np.reshape(a, (n,n))
Img[a<threshold] = 255
plt.imshow(Img)
plt.show()



Saturday, April 9, 2011

Myers-Briggs Test in Javascript / Python

Hunch.com apparently uses this method - Myers-Briggs Test is a psychology, profile evaluation system, expanded upon later by David Keirsey. We coded the evaluation scheme in Python, making a few additions. In the original version in David Keirsey book Please Understand Me II, the answer to each question is either A, or B, these choices in Python code as -1, +1 then sum up appropiate array values. Recent versions of this questionaire carry more (sometimes even five) choices. We noticed an additional 'neutral' choice can made an improvement, our version carries 3 answers. Eval code substitutes -1 for A, +1 for B, and 0 for neutral.

An example of the evaluation algorithm Keirsey uses in his book is above. We simply generate indexes that correspond to the columns seen above (answers arrive in a straight list, numbered from 1 to 70), then do the addition.
def calculate_mb(choices):

  new_choices = []

  for i in range(1,8):
      new_choices.append([int(choices[j-1]) for j in range(i,71,7) ])

      res = list("XXXX")

  ei = sum(new_choices[0])
  if ei < 0: res[0] = 'E'
  else: res[0] = 'I'

  sn = sum(new_choices[1]) + sum(new_choices[2])
  if sn < 0: res[1] = 'S'
  else: res[1] = 'N'

  tf = sum(new_choices[3]) + sum(new_choices[4])
  if tf < 0: res[2] = 'T'
  else: res[2] = 'F'

  jp = sum(new_choices[5]) + sum(new_choices[6])
  if jp < 0: res[3] = 'J'
  else: res[3] = 'P'

  logging.debug(choices)

  return str(''.join(res))

Another version, in HTML using Javascript, can be found here.

Monday, March 21, 2011

ARIMA, Forecasting and Python

I ported the R code found on Rob Hyndman's blog into Python + rpy2. The latter package allows calling of R code from Python which we used here to utilize the forecast package. Test R and Python codes can also be found in the zip file below.
import rpy2.robjects as R
import rpy2.rinterface as rinterface
from rpy2.robjects.packages import importr
import rpy2.robjects.numpy2ri
from rpy2.robjects.vectors import FloatVector, StrVector
import datetime
import numpy as np
forecast = importr("forecast")
R.r.source("arima-fn.R")

x = np.loadtxt("garan.dat",  skiprows=1)
n = len(x)
m = 10
terms = 4

x = np.insert(x, 10, np.nan) # put in a NaN to see what happens
n += 1

print x

R.r.assign('x',x)
R.r.assign('n',n)
R.r.assign('m',m)
R.r.assign('terms',terms)

R.r('fit = Arima(x=x, order=c(2,0,1), xreg=fourier(1:n,terms,m))')
R.r('f = forecast(fit, h=2*m, xreg=fourier(n+1:(2*m),terms,m))')
res = R.r('forecast_results(f)')
for x in res:
  print x

All Code

arima-fourier2.R

forecast_results <- function(res)
{
  return (res$mean)
}

fourier <- function(t,terms,period)
{
  print (terms)
  n <- length(t)
  X <- matrix(,nrow=n,ncol=2*terms)
  for(i in 1:terms)
  {
    X[,2*i-1] <- sin(2*pi*i*t/period)
    X[,2*i] <- cos(2*pi*i*t/period)
  }
  colnames(X) <- paste(c("S","C"),rep(1:terms,rep(2,terms)),sep="")
  return(X)
}

library(forecast)

yy <- read.table ("garan.dat", header=T)
y <- yy[,'price']

n <- length(y)
print (n)
m <- 10 # how many points in the future
terms = 4

fit <- Arima(y, order=c(2,0,1), xreg=fourier(1:n,terms,m))

f = forecast(fit, h=2*m, xreg=fourier(n+1:(2*m),terms,m))

print (forecast_results(f))

plot(f)

arima-fouerier.R

n <- 2000
m <- 200
y <- ts(rnorm(n) + (1:n)%%100/30, f=m)
print (y)
quit()
fourier <- function(t,terms,period)
{
  n <- length(t)
  X <- matrix(,nrow=n,ncol=2*terms)
  for(i in 1:terms)
  {
    X[,2*i-1] <- sin(2*pi*i*t/period)
    X[,2*i] <- cos(2*pi*i*t/period)
  }
  colnames(X) <- paste(c("S","C"),rep(1:terms,rep(2,terms)),sep="")
  return(X)
}

library(forecast)
fit <- Arima(y, order=c(2,0,1), xreg=fourier(1:n,4,m))
f = forecast(fit, h=2*m, xreg=fourier(n+1:(2*m),4,m))
plot(f)

print (y)
print (f)

arima.py

import rpy2.robjects as R
import rpy2.rinterface as rinterface
from rpy2.robjects.packages import importr
import rpy2.robjects.numpy2ri
from rpy2.robjects.vectors import FloatVector, StrVector
import datetime
import numpy as np
forecast = importr("forecast")
R.r.source("arima-fn.R")

x = np.loadtxt("garan.dat",  skiprows=1)
n = len(x)
m = 10
terms = 4

x = np.insert(x, 10, np.nan) # put in a NaN to see what happens
n += 1

print x

R.r.assign('x',x)
R.r.assign('n',n)
R.r.assign('m',m)
R.r.assign('terms',terms)

R.r('fit = Arima(x=x, order=c(2,0,1), xreg=fourier(1:n,terms,m))')
R.r('f = forecast(fit, h=2*m, xreg=fourier(n+1:(2*m),terms,m))')
res = R.r('forecast_results(f)')
for x in res:
    print x

garan.dat

price
6.40000
6.33000
6.43000
6.53000
6.48000
6.33000
6.28000
6.33000
6.33000
6.18000
6.28000
6.28000
6.43000
6.48000
6.23000
6.38000
6.33000
6.23000
6.45000
6.28000
6.38000
6.43000
6.45000
6.23000
6.04000
5.84000
5.94000
5.84000
5.95000
5.89000
5.94000
6.00000
6.35000
6.13000
6.38000
6.23000
6.04000
5.79000
5.49000
5.70000


Thursday, March 10, 2011

Image Segmentation using Active Contours, Level Sets

Here is a new image segmentation Python code which is a port of this Matlab file. It uses level sets and mean curvature motion, and is able to segment the sample image after few iterations. We had to write our own bwdist which is a wrapper for distance_transform_edt wrapper so bwdist acted the same way as Matlab bwdist(), some conversions were necessary to make this work. We also changed the initialization of \phi, making it simpler and closer to Dr. Li's code we shared before.

Code

import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as signal
import scipy.ndimage as image
import time
from scipy import ndimage

def bwdist(a):
   """
   this is an intermediary function, 'a' has only True, False vals,
   so we convert them into 0, 1 values -- in reverse. True is 0,
   False is 1, distance_transform_edt wants it that way.
   """
   b = np.ones(a.shape)
   b[a==True] = 0.
   return ndimage.distance_transform_edt(b)

def gauss_kern():
   """ Returns a normalized 2D gauss kernel array for convolutions """
   h1 = 15
   h2 = 15
   x, y = np.mgrid[0:h2, 0:h1]
   x = x-h2/2
   y = y-h1/2
   sigma = 1.5
   g = np.exp( -( x**2 + y**2 ) / (2*sigma**2) );
   return g / g.sum()

Img = plt.imread("twoObj.bmp")
Img = Img[::-1]
g = gauss_kern()
Img_smooth = signal.convolve(Img,g,mode='same')
Iy,Ix=np.gradient(Img_smooth)
absGradI=np.sqrt(Ix**2+Iy**2);
rows, cols = Img.shape

# initial function phi - level set is a square 4 pixels
# away from borders on each side, in 3D it looks like an empty
# box
c0=4
w=4
nrow, ncol=Img.shape
phi=c0*np.ones((nrow,ncol))
phi[w+1:-w-1, w+1:-w-1]=-c0

# edge-stopping function
g = 1 / (1+absGradI**2)

# gradient of edge-stopping function
gy,gx = np.gradient(g)

# gradient descent step size
dt=.4

# number of iterations after which we reinitialize the surface
num_reinit=10

phiOld=np.zeros((rows,cols))

# number of iterations after which we reinitialize the surface
iter=0

plt.ion()

while np.sum(np.sum(np.abs(phi-phiOld))) != 0:
   # gradient of phi
   gradPhiY, gradPhiX = np.gradient(phi)   
   # magnitude of gradient of phi
   absGradPhi=np.sqrt(gradPhiX**2+gradPhiY**2)
   # normalized gradient of phi - eliminating singularities
   normGradPhiX=gradPhiX/(absGradPhi+(absGradPhi==0))
   normGradPhiY=gradPhiY/(absGradPhi+(absGradPhi==0))
  
   divYnormGradPhiX, divXnormGradPhiX=np.gradient(normGradPhiX)
   divYnormGradPhiY, divXnormGradPhiY=np.gradient(normGradPhiY)
                         
   # curvature is the divergence of normalized gradient of phi
   K = divXnormGradPhiX + divYnormGradPhiY   
   tmp1 = g * K * absGradPhi
   tmp2 = g * absGradPhi
   tmp3 = gx * gradPhiX + gy*gradPhiY
   dPhiBydT =tmp1 + tmp2 + tmp3
   phiOld=phi
   # level set evolution equation   
   phi = phi + ( dt * dPhiBydT )
   iter=iter+1
   if np.mod(iter,num_reinit)==0:
       # reinitialize the embedding function after num_reinit iterations
       phi=np.sign(phi)
       phi = (phi > 0) * (bwdist(phi <> 0))
      
   if np.mod(iter,10)==0:
       time.sleep(0.6)
       plt.imshow(Img, cmap='gray')
       plt.hold(True)
       CS = plt.contour(phi,0, colors='r')
       plt.draw()
       plt.hold(False)

Sunday, January 30, 2011

Heat Equation solution using Finite Difference and Crank-Nicholson

I rewrote parts of this code so that it used dense (non-sparse) matrices instead of sparse matrices, for demonstration purposes. The code is below:
import numpy as np
import scipy.linalg

# Number of internal points
N = 200

# Calculate Spatial Step-Size
h = 1/(N+1.0)
k = h/2

x = np.linspace(0,1,N+2)
x = x[1:-1] # get rid of the '0' and '1' at each end

# Initial Conditions
u = np.transpose(np.mat(10*np.sin(np.pi*x)))

# second derivative matrix
I2 = -2*np.eye(N)
E = np.diag(np.ones((N-1)), k=1)
D2 = (I2 + E + E.T)/(h**2)

I = np.eye(N)

TFinal = 1
NumOfTimeSteps = int(TFinal/k)

for i in range(NumOfTimeSteps):
# Solve the System: (I - k/2*D2) u_new = (I + k/2*D2)*u_old
A = (I - k/2*D2)
b = np.dot((I + k/2*D2), u)
u = scipy.linalg.solve(A, b)
Once the code is finished, the last u will have the final values. More detailed code along with the sparse matrix version is here.

Saturday, January 15, 2011

Discrete Laplace Transform - del2

I ported Matlab del2() call into Python. Here it is for reference:
import numpy as np

def del2(M):
dx = 1
dy = 1
rows, cols = M.shape
dx = dx * np.ones ((1, cols - 1))
dy = dy * np.ones ((rows-1, 1))

mr, mc = M.shape
D = np.zeros ((mr, mc))

if (mr >= 3):
## x direction
## left and right boundary
D[:, 0] = (M[:, 0] - 2 * M[:, 1] + M[:, 2]) / (dx[:,0] * dx[:,1])
D[:, mc-1] = (M[:, mc - 3] - 2 * M[:, mc - 2] + M[:, mc-1]) \
/ (dx[:,mc - 3] * dx[:,mc - 2])

## interior points
tmp1 = D[:, 1:mc - 1]
tmp2 = (M[:, 2:mc] - 2 * M[:, 1:mc - 1] + M[:, 0:mc - 2])
tmp3 = np.kron (dx[:,0:mc -2] * dx[:,1:mc - 1], np.ones ((mr, 1)))
D[:, 1:mc - 1] = tmp1 + tmp2 / tmp3

if (mr >= 3):
## y direction
## top and bottom boundary
D[0, :] = D[0,:] + \
(M[0, :] - 2 * M[1, :] + M[2, :] ) / (dy[0,:] * dy[1,:])

D[mr-1, :] = D[mr-1, :] \
+ (M[mr-3,:] - 2 * M[mr-2, :] + M[mr-1, :]) \
/ (dy[mr-3,:] * dx[:,mr-2])

## interior points
tmp1 = D[1:mr-1, :]
tmp2 = (M[2:mr, :] - 2 * M[1:mr - 1, :] + M[0:mr-2, :])
tmp3 = np.kron (dy[0:mr-2,:] * dy[1:mr-1,:], np.ones ((1, mc)))
D[1:mr-1, :] = tmp1 + tmp2 / tmp3

return D / 4

Level Sets and Image Segmentation with Python

I ported the Matlab code that was written by Li, Xu, Gui and Fox for their paper Level Set Evolution Without Re-initialization: A New Variational Formulation, into Python. Li et al. use level sets with a new approach and are able to segment an image succesfully. Some screenshots as well as the code itself is below.







Code

Friday, January 14, 2011

Gaussian Convolution, Blurring using Python

import matplotlib.pyplot as plt
import numpy as np
import scipy.signal as signal

def gauss_kern():
""" Returns a normalized 2D gauss kernel array for convolutions """
h1 = 15
h2 = 15
x, y = np.mgrid[0:h2, 0:h1]
x = x-h2/2
y = y-h1/2
sigma = 1.5
g = np.exp( -( x**2 + y**2 ) / (2*sigma**2) );
return g / g.sum()

Img = plt.imread("twoObj.bmp")
Img = Img[::-1]
g = gauss_kern()
Img_smooth = signal.convolve(Img,g,mode='same')
plt.imshow(Img_smooth)
plt.show()
Results:

Before Image



After Image

Friday, December 24, 2010

Saturday, December 18, 2010

Fourier Transform and Sunspots Data

I recoded parts of the Fourier analysis example that uses sunspots data found here. It needed updating for most recent Python. All graphs match the original output.
import numpy as np
from scipy import fft, array
from matplotlib import pyplot as gplt

tempdata = np.loadtxt("sunspots.dat")

year=tempdata[:,0]
wolfer=tempdata[:,1]

gplt.plot(year,wolfer)
gplt.show()

gplt.plot(year[0:50],wolfer[0:50])
gplt.show()

Y=fft(wolfer)
gplt.plot(Y.real,Y.imag, 'r+')
gplt.xlim(-4000,2000)
gplt.ylim(-4000,4000)
gplt.show()

n=len(Y)
power = np.abs(Y[1:(n/2)])**2
nyquist=1./2
freq=array(range(n/2))/(n/2.0)*nyquist
freq += 0.00001
period=1./freq
gplt.plot(1/period[1:len(period)], power)
gplt.xlabel('cycles/year')
gplt.title('Periodogram')
gplt.show()

gplt.plot(1/period[1:40], power[1:40])
gplt.show()

gplt.plot(period[1:len(period)], power)
gplt.xlim(0,40)
gplt.show()

Monday, November 1, 2010

Multilevel Modeling on UK School Data

I came across a dataset on UK schools that has scores, student / school gender information. After some digging on the Net, I found a sample Bugs model for this data, so I decided to give it a try. The data file was mentioned here http://dss.princeton.edu/training/Multilevel101.pdf and is at http://dss.princeton.edu/training/schools.dta.

I converted this data Python readable format and saved it as testing.dat. Sample Bugs model was at http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-software/reviewwinbugs.pdf

Other useful info: http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-software/reviewaml.pdf

The example code is written in Python / JAGS. The model can certainly be improved. Keyword dflat() does not seem to be supported anymore, so dnorm(0.0,1.0E-6) was used instead when necessary.

Data and recent code

import rpy2.robjects as R
import rpy2.rinterface as rinterface
from rpy2.robjects.packages import importr
import rpy2.robjects.numpy2ri
from rpy2.robjects.vectors import FloatVector, StrVector
import numpy as np
r2jags = importr('R2jags')

schools = np.loadtxt("testing.dat",  skiprows=1)

n=len(schools)
J = np.unique(np.max(schools[:,0]))[0]
y = schools[:,2]
school = schools[:,0]
standlrt = schools[:,3] # reading score
gender = schools[:,4] # student gender
school_gender = schools[:,5] # school gender, 1 mixed, 2 boys, 3 girls, 

boysch = np.zeros(n)
boysch[school_gender == '2'] = 1

girlsch = np.zeros(n)
girlsch[school_gender == '3'] = 1

R.r.assign('n',n)
R.r.assign('J',J)
R.r.assign('y',y)
R.r.assign('school',school)
R.r.assign('standlrt',standlrt)
R.r.assign('gender',gender)
R.r.assign('boysch',boysch)
R.r.assign('girlsch',girlsch)

jags_data = StrVector(("n", "J", "y", "school", "standlrt", "gender", "boysch", "girlsch"))
jags_params = StrVector(("a", "beta","tau.y", "tau.a"))

jags_inits = R.r('''function (){
  list (a=rnorm(J),mu=rnorm(1),sigma.y=runif(1), sigma.a=runif(1))
  }''')

jags_schools = r2jags.jags(data = jags_data, inits = jags_inits,
                           parameters_to_save = jags_params,
                           n_iter = 10000,
                           n_chains = 3, n_thin = 100,
                           model_file = "testing.bug")

print jags_schools



Wednesday, October 13, 2010

N Maximum Elemens using Python

This method will return N topmost (maximum) elements in a list.
def n_max(list, n):
y = range(len(list))
i = sorted(y, key=list.__getitem__)
res = i[len(list)-n:len(list)]
res.reverse()
return res

Friday, September 24, 2010

Regression with Student-t Errors

In order to fit a regression that assumes error is distributed as Student-t instead of Normal, Statsmodels' TLinearModel is the class to use. On R same thing is accomplished using the "hett" library tlm function. We share here R and Python sample code that uses same data to report results using both functions.

Hett library can be installed on R by using install.packages("hett"). TLinearModel is part of a development branch on Statsmodels, you can get it here:

https://code.launchpad.net/~scipystats/statsmodels/devel

test_tlm.py

import numpy as np
from scipy import stats, special, optimize
import scikits.statsmodels as sm
from scikits.statsmodels.miscmodels import TLinearModel

nobs = 60
nvars = 2
df = 3

mm = np.loadtxt("mm.dat",  skiprows=1, usecols = (3,4))

data_exog = mm[:,1]
data_endog = mm[:,0]

data_exog = sm.add_constant(data_exog)

res_ols = sm.OLS(data_endog, data_exog).fit()

kurt = stats.kurtosis(res_ols.resid)
df_fromkurt = 6./kurt + 4

modp = TLinearModel(data_endog, data_exog)

start_value = 0.1*np.ones(data_exog.shape[1]+2)

start_value[:nvars] = res_ols.params
start_value[-2] = df_fromkurt 
start_value[-1] = np.sqrt(res_ols.scale) #0.5
modp.start_params = start_value

fixdf = np.nan * np.zeros(modp.start_params.shape)
fixdf[-2] = 5

modp.fixed_params = fixdf
modp.fixed_paramsmask = np.isnan(fixdf)
modp.start_params = modp.start_params[modp.fixed_paramsmask]

resp = modp.fit(start_params = modp.start_params, disp=1, method='nm',
                maxfun=10000, maxiter=5000)

print '---------------------------'
print resp.bse
print resp.params


test_tlm.R

library("hett")
data(mm, package = "hett")
attach(mm)
# apparently second ~ CRSP is for skasdicity correction
# tfit2 <- crsp="" data="mm," estdof="TRUE)</font" m.marietta="" start="list(dof" tlm="">
tfit2 <- crsp="" data="mm," estdof="TRUE)</font" m.marietta="" start="list(dof" tlm="">
summary(tfit2)

#(Intercept) -0.006437   0.008096  -0.795     0.43    
#CRSP         1.206853   0.205935   5.860 2.31e-07 ***

mm.dat

"date" "am.can" "m.marietta" "CRSP"
"1" "Jan.1982" -0.0596 -0.1365 -0.03
"2" "Feb.1982" -0.17 -0.0769 -0.0584
"3" "Mar.1982" 0.0276 -0.0575 -0.0181
"4" "Apr.1982" 0.0058 0.0526 0.0306
"5" "May1982" -0.0106 -0.0449 -0.0397
"6" "June1982" 0.045 -0.0859 -0.0295
"7" "Jul.1982" -0.0243 -0.0742 -0.0316
"8" "Aug.1982" 0.1135 0.6879 0.1176
"9" "Sept.1982" -0.0331 -0.077 0.0075
"10" "Oct.1982" 0.0468 0.085 0.1098
"11" "Nov.1982" -0.0223 0.003 0.0408
"12" "Dec.1982" -0.0026 0.0754 0.0095
"13" "Jan.1983" 0.0166 -0.0412 0.0301
"14" "Feb.1983" 0.0343 -0.089 0.0221
"15" "Mar.1983" 0.0443 0.2319 0.0269
"16" "Apr.1983" 0.1477 0.1087 0.0655
"17" "May1983" 0.1728 0.0375 -0.003
"18" "June1983" -0.0372 0.0958 0.0325
"19" "July1983" -0.0451 0.0174 -0.0374
"20" "Aug.1983" -0.0257 -0.0724 0.0049
"21" "Sept.1983" 0.0509 0.075 0.0105
"22" "Oct.1983" 0.0035 -0.0588 -0.0257
"23" "Nov.1983" 0.1334 -0.062 0.0186
"24" "Dec.1983" -0.0458 -0.0378 -0.0155
"25" "Jan.1984" 0.1199 0.0169 -0.0165
"26" "Feb.1284" -0.0766 -0.0799 -0.044
"27" "Mar.1984" -0.0511 -0.0147 0.0094
"28" "Apr.1984" -0.0194 0.0106 -0.0028
"29" "May1984" -0.0687 -0.0421 -0.0591
"30" "June1984" 0.0928 -0.0036 0.0158
"31" "July1984" -0.0704 0.0876 -0.0238
"32" "Aug.1984" 0.0905 0.1025 0.1031
"33" "Sept.1984" 0.0232 -0.0499 -0.0065
"34" "Oct.1984" -0.0054 0.1953 -0.0067
"35" "Nov.1984" 0.0082 -0.0714 -0.0167
"36" "Dec.1984" 0.0242 0.0469 0.0188
"37" "Jan.1985" 0.0153 0.1311 0.0733
"38" "Feb.1985" 0.0016 0.0461 0.0105
"39" "Mar.1985" 0.028 -0.0328 -0.007
"40" "Apr.1985" 0.0088 -0.0096 -0.0099
"41" "May1985" 0.0734 0.1272 0.0521
"42" "June1985" 0.0315 -0.0077 0.0117
"43" "July1985" -0.0276 0.0165 -0.0099
"44" "Aug.1985" 0.0162 -0.015 -0.0102
"45" "Sept.1985" -0.0975 -0.1479 -0.0428
"46" "Oct.1985" 0.0563 -0.0065 0.0376
"47" "Nov.1985" 0.1368 0.039 0.0628
"48" "Dec.1985" -0.069 0.0223 0.0391
"49" "Jan.1986" 0.1044 -0.069 2e-04
"50" "Feb.1986" 0.1636 0.1338 0.0688
"51" "Mar.1986" -0.019 0.1458 0.0486
"52" "Apr.1986" -0.0746 0.0063 -0.0174
"53" "May1986" 0.0433 0.0692 0.046
"54" "June1986" 0.0306 -0.0239 0.01
"55" "July1986" 0.0636 -0.0568 -0.0594
"56" "Aug.1986" 0.0917 0.0814 0.068
"57" "Sept.1986" -0.0796 -0.0889 -0.0839
"58" "Oct.1986" 0.0778 -0.0887 0.0481
"59" "Nov.1986" -0.0353 0.1037 0.0136
"60" "Dec.1986" -0.0137 -0.1163 -0.0322

Tuesday, September 14, 2010

Statsmodels - Ordinary Least Squares

Here is a simple example of performing ordinary least squares using Scikits Statsmodels. Extra column of '1's in X matrix was added to determine the intercept of the regression line, sm.add_constant() call could also be used to add this extra column.
import numpy as np
import scikits.statsmodels as sm

y = [11,14,19,26]

X = [[1,1],[2,1],[3,1],[4,1]]

olsmod = sm.OLS(y, X)
olsres = olsmod.fit()

print olsres.params
print olsres.bse

# predict using new data points
ypred = olsmod.predict([[5,1],[6,1]])
print ypred

Friday, September 10, 2010

GLM and Python, scikits.statsmodels

Python version of the code for ARM 6.2 uses GLM() call under the scikits.statsmodels package. AFAIK there is no equivalent call for R factor() in this package. Also, in order to get the intercept as in R, we need to add an extra column of '1's. Other than that, modified R calls (withouts factors) and Python GLM calls match exactly.
import numpy
import scikits.statsmodels as sm
from scipy import stats

data = numpy.loadtxt("frisk_with_noise.dat", skiprows=7)

X = numpy.zeros((3,len(data[:,0])))
print X.shape

arrests = data[:,2]
arrests[arrests == 0] = 1
arrests = numpy.log(arrests)

stops = data[:,0]
stops[stops==0.0] = .0001

X[0,:] = arrests # arrests
X[1,:] = data[:,4] # eth
X[2,:] = numpy.ones(len(data[:,0])) # eth

glm = sm.GLM(stops, X.T, family=sm.family.Poisson())
res = glm.fit()
Google Groups

Thursday, August 26, 2010

GCSR - Example 3.7

Attached is the Python source code for GCSR Example 3.7, the bioassay problem. The code is based on Helle Sorensen's code at KU. It samples from alpha, beta posteriors using grid approximation and displays a contour plot, and then, samples from the joint distribution and displays an X-Y plot of the sampled values. The plots duplicate Figure 3.4 (a) and (b) from GCSR.

Code

Tuesday, August 17, 2010

Sampling With Replacement Using Weights in Python

Here is the Python function corresponding to sample() call in R. We based it on the code here; only changed it so that the inputs use seperate weight and value vectors instead of one vector that has tuples of weight, value pairs.
import random

items = [(10, "low"),
(100, "mid"),
(890, "large")]

w = [10, 100, 890]
v = ["low", "mid", "large"]

def weighted_sample(ws, vs, n):
total = float(sum(w for w in ws))
i = 0
w = ws[0]
v = vs[0]
while n:
x = total * (1 - random.random() ** (1.0 / n))
total -= x
while x > w:
x -= w
i += 1
w = ws[i]
v = vs[i]
w -= x
yield v
n -= 1

for i in weighted_sample(w, v, 500):
print i

Monday, July 5, 2010

Change Point Analysis using MCMC Gibbs Sampling on Coal Mining Data (in Python)

The code is here. This analysis is performed on British coal mining accident data, which is included in the zip file as well. The function coal() performs change point analysis using MCMC Gibbs sampling which models the data using two Poisson distributions. Change point analysis finds the point k when the second Poisson distribution, instead of the first comes into effect which means a change in regulations, etc. at that point in time. I wrote the Python script looking at Peter Neil's R code from Manchester University.

Also included is the same analysis performed using Bugs (JAGS). This code is adapted from Bayesian Computation with R book.