Len() calls can be SLOW in Berkeley Database and Python bsddb.

In my day-to-day coding work, I make extensive use of Berkeley DB (bdb) hash and btree tables. They’re really fast, easy-ish to use, and work for the apps I need them for (persistent storage of json and other small data structures).

So, this python code was having all kinds of weird slowdowns for me, and it was the len() call (of all things) that was causing the issue!

As it turns out, sometimes the Berkeley database does have to iterate over all keys to give a proper answer. Even the “fast stats” *number of records* call has to

Jesus Cea’s comments one why bdb’s don’t know how many keys they have
db_stat tool description
DB->stat api

Dumping and loading a bsddb, for humans.

Sometimes things happen with Python shelves that screw up the bsddb’s (Berkeley DB [bdb] databases*) that power them. A common way for this to happen is when two apps have it open for writing, and something goes flooey like both try to write to the same page. The bsddb emits this helpful error:

DBRunRecoveryError: [Terror, death and destruction will ensue] or something equally opaque and non-reassuring

So how to run the recovery, eh? Assuming you have the db_dump and db_load tools on your platform, take hints from Library and Extension FAQ and try this bash snippet:


## example usage:
## $ bdb_repair  /path/to/my.db
function bdb_repair {
  BDIR=`dirname $1` #  /path/to/dir    
  BADDB=`basename $1`   #  bad.db  
  cd $BDIR  && \
  cp $BADDB{,.bak}  # seriously!  back it up first  
  db_dump -f $BADDB.dump  $BADDB   # might take a while
  db_load -f $BADDB.dump  $BADDB.repaired
  cp -o $BADDB.repaired $BADDB
  cd -

So far, I’ve had universal success with this method.

If any bash gurus want to improve the error handling here, I’d appreciate it.

* Yes, I know this is redundant.