A Scratchpad

Sunday, May 9, 2010

nosql

My comments in response to nosql episode of Command Line podcast

There were few mistakes on perception of nosql databases; First of all, the advantage of nosql is not that it does something SQL databases "cannot do". It does distribution of data out of the box, that is, it is so simplified, ingrained in the product that you don't even think twice about them. But with SQL databases, sharding, distribution is an afterthought. Not that you cannot DO these with SQL databases, it's just that with nosql these tasks are SIMPLER. Included in the product from day one.

There are pedagocial issues at play here, which are almost as important as technological ones.

Same is true for basic CRUD operations. They are SIMPLER with nosql than they are with sql dbs. With Google Bigtable, I define Model classes in Python, send them over to the cloud, and I _have_ a database. Following through pointers, as in order.owner.address.street is very simple to do, and built-in, in contrast to SQL databases where you have to use something like Hibernate to achive the same result.

Plus, nosql makes you concious of sharding of data from day one; since joins are discouraged, you think distribution, and you have to think big. Sure, for small Web sites, small # of users you can use one database, and keep using joins, but you can also use one nosql shard, and use LESS complicated query (meaning no joins) and achieve same result.

Wednesday, March 24, 2010

Kalman Filter in Python

The attached Kalman filter code is based on Python example found in book Kalman and Bayesian Filters in Python by Labbe.

Thursday, January 21, 2010

Porting code from Matlab / Octave to Python Numpy

Things to watch out for while porting code from Octave / Matlab to Python Numpy

The V returned from Numpy U, D, V = svd is not the same V in M / O. In order to access the equivalent, you need to do V.T in Python.
Instead of a = [2 3 4] you use a = [2, 3, 4]
Use * in place of .*
The -1 in reshape means I don't care what you do with rows, just calculate everything according to column parameter.
D returned from svd call is not in diagonal matrix form. It is simple a vector of values that form the diagonal of Matlab's D. This is most likely done for efficiency reasons, and it makes sense. If you need this data in diagonal form, simple call Numpy diag(D) it will form the square matrix for you.
Use dot() instead of *
Don't forget Matlab / Octave use 1-based indexing of arrays where Python uses 0. So x(3, :) in Matlab / Octave would become x[2, :].
Oh, [] instead of (). Of course.
** instead of .^
Instead of find() you just write the filter condition directly on the matrix, vector var itself, but then you need to call nonzero() to get index values, otherwise you get True, False values. The call ind = find(abs(x(3,:)) > bla) becomes ind = (abs(x[2, :]) > bla).nonzero()
The constant eps is not defined, I simple hardcode it globally eps=1e-15
Instead of special index value 'end', you have to use negative index value -1. Same for end-1, end-2. Things get a bit confusing however, when there is a "range" involved, such as all columns including the last one. In that case, you don't use -1 at all, just leave the index blank.
Example: a = array([[1,3,8],[2,4,0],[9,9,9]])
print a[:,-1] gives [8 0 9]
print a[:,1:] gives [[3 8] [4 0][9 9]]

Reference

Sunday, December 27, 2009

Line Fitting using Kalman Filters

Download