I kept getting this sort of error from
createlang (PG 8.1 on Centos 4 — from when dinosaurs walked). I tried this:
$ sudo yum install postgresql-python.x86_64
But this wasn’t enough to get
$ sudo -u postgres createlang plpythonu mydb
createlang: language installation failed: ERROR: could not access file "$libdir/plpython": No such file or directory
It turns out that there is a non-obvious dependency:
$ sudo yum install postgresql-python.x86_64 postgresql-pl.x86_64
$ sudo -u postgres createlang --echo plpythonu test3
SELECT oid FROM pg_catalog.pg_language WHERE lanname = 'plpythonu';
CREATE LANGUAGE "plpythonu";
createlang --echo is useful)
HBase is the Hadoop database. Its (sic) an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop.
HBase’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Try it if your plans for a data store run to big.
Well, my plans don’t run to big, but they do run to indexed over time. Since every cell in an HBase table has a timestamp, it makes it really easy to snapshot data over time, and “rollback” a query as though it was asked at any point in the past. For data that changes rarely over time, but for which one wants a historical record, this might make querying with history much simpler.
Historical Data Example
Think about how an organization changes over time. Employees enter and leave, business units might be bought and sold. One approach to modeling this is to take a snapshot every day, and store that in a RDBMS. The snapshots will have lot of redundant information, since an org doesn’t really change very much.
A simpler model is to simply enter a new snapshot of the organization when only when it changes, essentially overwriting the previous configuration. Since HBase automatically labels cells with timestamp, this comes for free.
Setting it up
Using Ole-Martin Mørk’s instructions was a breeze! Even though I know almost nothing about Java and the Java environment, I managed it. I followed them, with these modifications:
- After downloading, unzipping, and symbolic linking to ~hbase, I version control the whole thing ( $ git init; git-add * ; git ci -m “initial checkin, as unpacked from source”) , so that if I foul up anything, I can easily revert!
- Edit ~hbase/conf/hbase-env.sh to have the right “JAVA_HOME” which for me (Debian) is -> export JAVA_HOME=/usr/lib/jvm/java-6-openjdk
Since I don’t have passwordless ssh set up to local host, I get this error:
localhost: ssh: connect to host localhost port 22: Connection refused
The rest of the example seems to run fine though, and I’m in no mood to really track this down, since I’m still in the experiment phase.
I’m not sure whether I’m be going any deeper anytime soon, since I have a lot of SqlAlchemy code built around handling these sorts of ‘historical’ queries (where inserting and updating are the real difficulties!), but I do like the idea of easily versioned, map-like data stores quite well.
Quick hint on unix:
$ date -u "+%c" -d @1234567890 Fri 13 Feb 2009 11:31:30 PM UTC