Installing PlPython (Postgres 8.1 on Centos 4)

I kept getting this sort of error from createlang (PG 8.1 on Centos 4 — from when dinosaurs walked).  I tried this:

$ sudo yum install postgresql-python.x86_64

But this wasn’t enough to get createlang going.

$ sudo -u postgres createlang plpythonu mydb
createlang: language installation failed: ERROR:  could not access file "$libdir/plpython": No such file or directory

It turns out that there is a non-obvious dependency:

$ sudo yum install postgresql-python.x86_64 postgresql-pl.x86_64

$ sudo -u postgres createlang --echo plpythonu test3
SELECT oid FROM pg_catalog.pg_language WHERE lanname = 'plpythonu';
CREATE LANGUAGE "plpythonu";

Thus, postgresql-pl.x86_64 is a sooper sekrit dependency.

Good luck!

(ps.:  createlang --echo is useful)

Baby Steps into HBase

Today, after reading (the amazing and invaluable!) Understanding HBase and BigTable, while researching schemas for Google App Engine, I took my first tentative steps into using HBase.  About HBase:

HBase is the Hadoop database. Its (sic) an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop.

HBase’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Try it if your plans for a data store run to big.

Well, my plans don’t run to big, but they do run to indexed over time.  Since every cell in an HBase table has a timestamp, it makes it really easy to snapshot data over time, and “rollback” a query as though it was asked at any point in the past.   For data that changes rarely over time, but for which one wants a historical record, this might make querying with history much simpler.

Historical Data Example

Think about how an organization changes over time.  Employees enter and leave, business units might be bought and sold.  One approach to modeling this is to take a snapshot every day, and store that in a RDBMS.    The snapshots will have lot of  redundant information, since an org doesn’t really change very much.

A simpler model is to simply enter a new snapshot of the organization when only when it changes, essentially overwriting the previous configuration.  Since HBase automatically labels cells with timestamp, this comes for free.

Setting it up

Using Ole-Martin Mørk’s instructions was a breeze!  Even though I know almost nothing about Java and the Java environment, I managed it.  I followed them, with these modifications:

  1. After downloading, unzipping, and symbolic linking to ~hbase, I version control the whole thing ( $ git init;  git-add * ; git ci -m “initial checkin, as unpacked from source”) , so that if I foul up anything, I can easily revert!
  2. Edit ~hbase/conf/ to have the right “JAVA_HOME” which for me (Debian) is  -> export JAVA_HOME=/usr/lib/jvm/java-6-openjdk

Since I don’t have passwordless ssh set up to local host, I get this error:

~/hbase$ ~/hbase/bin/
localhost: ssh: connect to host localhost port 22: Connection refused

The rest of the example seems to run fine though, and I’m in no mood to really track this down, since I’m still in the experiment phase.

Future Steps

I’m not sure whether I’m be going any deeper anytime soon, since I have a lot of SqlAlchemy code built around handling these sorts of ‘historical’ queries (where inserting and updating are the real difficulties!), but I do like the idea of easily versioned, map-like data stores quite well.

Quick hint: Converting from Epoch to Localtime using Unix Date

Quick hint on unix:

$ date -u "+%c" -d @1234567890
Fri 13 Feb 2009 11:31:30 PM UTC

cf:   Convert timestamp to date in Bash