An IanB-tastic day, or How A Bug Becomes a Fix

People don’t write enough about how they catch, report, and fix bugs. I hope others will follow my lead in exposing the process more.

  1. Tried to read IanB’s revised webob tutorial at
    http://pythonpaste.org/webob/do-it-yourself.html
  2. where I got annoyed by how copying and pasting the code is hard
    with the “>>>” and “…” symbols. At the first example!
  3. Since Python-Sphinx is the issue, spent some time in #python-docs
    discussing soluions with Taggnostr, including ones that other
    code highlighters use.
  4. Thought about how IanB probably likes that the code isn’t just
    cut and pastable, since it typing it in yourself is much better for learning.
    Decided that I didn’t care!
  5. Built on Taggnostr’s jquery-based fix on the installed
    doctools.js file.
  6. Branched and checked out Sphinx from BitBucket to work on it
    it more formally…
  7. where I promptly made a mess of things. I don’t know jquery or
    javascript very well, so there was a lot of fussing. The problem
    was that between my version and tip, underscore.js was added, so
    using the new doctools.js file in my generated sphinx html tree
    was causing some silent errors! Boy, JS seems to be hard to troubleshoot,
    and not very good at failing loudly!
  8. After finishing my fixes, checked in the fix to BitBucked and pushed.
  9. Made a pull-request, where I discovered that then you pull on your own
    branch at BB, it sends the request to you. This seems, hm…, unintuitive!
  10. After asking about it at #mercurial who answered, and #bitbucket who didn’t,
    people agreed that this was, um, odd behaviour.
  11. So, bug time at BitBucket, where after a search, I found Bug 681…
    (http://bitbucket.org/jespern/bitbucket/issue/681/master-repositories-dont-need-pull-request)
  12. …which was filed by IanB!

Whython – Python For People Who Hate Whitespace

Whython : Whitespace Haters Python

https://writeonly.files.wordpress.com/2010/04/whython.png

Example

Clearly Confusing (standard 3.x):

for ii in range(10):
    print(ii)
    print("which is %s" % (['even','odd'][ii % 2]))

Improved:

for ii in range(10) {
    print(ii);
    print("which is %s" % (['even','odd'][ii % 2]));
}

Maximum Enterprise Whythonic:

for ii in range(10) { print(ii); print("which is %s" % (['even','odd'][ii % 2])); }

How about some Scheme with your Python?

defun myfun():  return 1
assert myfun() == 1

Or add some Ruby shine?

def myfun() BEGIN return 1; END
assert myfun() == 1

Why Whython?

  • Less Whitespace, More Enterprise
  • It’s not a real language without braces and semi colons
  • Whitespace delimited is like so restrictive, man!
  • Python sucks for code golf
  • Finally, a Python for everyone who can’t decide between tab and space
  • Possibly (as in the mathematical sense – a small non-zero probability)
    useful for doing command line one liners in python
  • Help determine how bad a PEP/developement idea needs to be before
    someone gets kickbanned from #python-dev.

More seriously

  • reading the Dragon Book [Aho86] gives a person dangerous ideas
  • good excuse to deep dive into the python interpreter source code and the AST, dis modules
  • finally wanted to learn GDB and python -d debug mode
  • humoring trolls is fun
  • for education, the whitespace thing really can cause problems. When
    copying code out of books into IDLE or IPython, there are corner cases when
    it terminates blocks “too early”, confusing new learners.
  • preparation for the “Python Spring Cleaning” project, to see how hard it is
    to get and modify source, write a PEP, raise bug ideas, talk in irc, etc.
  • since this is unlikely to ever be adopted by Python (I hope!), it will
    remain a useful exercise, unlike othe “bugs” which get fixed once and for
    all

Want It? (Download and Install)

Are you sure you can handle this level of awesome? Okay! Download and install:

http://bitbucket.org/gregglind/python-whython3k/src/

## Get the source!
$ hg clone https://gregglind@bitbucket.org/gregglind/python-whython3k/
    # or if you haven't jumped on the `Mercurial <http://mercurial.selenic.com/wiki/Tutorial>`_  bandwagon
    # then:  wget http://bitbucket.org/gregglind/python-whython3k/get/79a2c77fe3e1.zip and unzip it!
$ cd python-whython3k
$ configure  # go make a pot of tea
$ make       # go watch an episode of the `IT Crowd <http://www.netflix.com/WiMovie/The_IT_Crowd_Series_1/70113774>`_
$ ./whython  # beautiful failure begins

Limitations

  • only simple_stmt are really usable in this way. That means that
    blocks (functions, if, else, etc.) can’t be nested inside a braced block.

Thanks to

  • The Authors of PEP 306
  • GVR, Martin v. Loewis (my umlaut is misbehaving!), Georg Brandl, Greg Ewing, Jeremy Hylton and others on the
    Python-Dev mailing list
  • Fred Drake, for responding to my crazy and incoherent email
  • gutworth, merwok, __ap__ and others in #python-dev

References

[Aho86] Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman.
Compilers: Principles, Techniques, and Tools,
http://www.amazon.com/exec/obidos/tg/detail/-/0201100886/104-0162389-6419108

You Know, For Kids!

Swear words are a powerful motivator for novice programmers, and an unfortunate byproduct of advanced ones.

Teaching the computer to swear at you in random, creative ways is a powerful experience. It’s easy in most modern operating systems that include scripting languages (thanks for nothin’, Windows!*). So, you out there with the kids, the friends who want to learn, the curious, pass the magical power of curse words along.

* Yes, it’s even possible on Windows thanks to vbscript! On Mac / Linux / Unix, use your favorite: Python / Ruby / Bash / ….!

(inspired by Juliet at how-do-the-young-start-programming-nowadays)


No Geek Bulls**t Programming Class (Results so Far)

The Project

Create an accessible ‘learn to program’ class, using Python. Undo damage and barriers to access around geek culture, endemic sexism and racism, and models that say that “only certain people can program”.

Bits and Bites (at TC ExC0)

Choose Your Own Pyventure (Wikibook)

Results So Far

So far there have been two class sessions. The gender mix (self-identified) is about 50/50/0 male/female/(genderqueer, intersex) and we have 10 students or so. The self-identified goals of students included: building programs for work, changing careers, remedying previous bad programming class experiences, (rarer) learning python specifically (after knowing some other language).

Lessons Learned (and some Theories)

# Make the class accessible

  • No alpha male bulls**t
  • No pissfighting over languages, programming backgrounds, etc. remember, even experts start as newbies.
  • create safer, accessible spaces (physically accessible, make childcare credits available, advertise to underserved communities. avoid gender / sexuality assumptions, respect pronouns. Enforce safer space.)

# emotions matter in the learning experience

  • acknowledge the complexity of programming
  • programmers are made, not born
  • programming is hard to do, hard to learn
  • explain that it was hard for you to learn as well.
  • remind learners that making mistakes is how one learns to program

# Start far back. Go back further. Most students know little about how the computer works.

  • they haven’t seen / heard of / used the command line / terminal
  • they don’t know the difference between the shell and the python environment
    • they try things like ” >>> python program.py “
  • there will be mac and windows users, prepare for both
  • some learners will have programmed before, some will not

# Have a goal / main project for the course

  • connect with students.
  • build toward a full project
  • lessons should iteratively replace / improve / expand on code made during previous lessons
  • no math. Math algorithms are boring and irrelevant for most people. Python makes strings easy. Easy strings makes for easy to discuss, real-world data

# Don’t get bogged down in syntax. People don’t care. Python has awesome syntax, mostly.

  • Gloss over warts and complexities
  • Avoid jargon

# Don’t get bogged down in datatypes. Don’t mention unicode. Ignore tuples.

  • Do mention strings, “numbers” (encompassing ints and floats)
  • dictionaries before lists. Associating keys and values parallels associating variable names with values. After teaching dicts, lists are trivial.

# relate functions and data structures. They are intertwined and need to be taught in parallel.

  • Functions exist to process data structures, and data exist to feed functions

# Ignore Objects and Object-Oriented Programming

  • OO isn’t hard, but it is confusing, especially for newbies
  • More importantly, it’s *irrelevant* for most early programming tasks

# Now matters more than Complete

  • Use Wikibooks or Google Docs for ease in sharing materials. (if repeating, we might choose GDocs — Wikibooks is too much machinery)
  • Don’t worry about getting all the details right

# POWERPOINT IS DEATH


When Great Features Aren’t Enough: Twisted, Tornado, the Zero-Step, and Activation Energy

Fresh on the heels of Tornado’s release, and Glyph’s response to it (note 1) and others, I’ve been thinking about why Tornado so excites me.

Twisted is a robust, powerful, scalable asynchronous web framework (among other things). We have used it successfully in the past. Taking them at their word, Tornado is scalable, but focused on http and much less fully featured than Twisted, it does provide authentication pieces (awesome!), and some other utilities.  In architectural terms, Glyph is probably right that Tornado is incomplete (to be polite).

I still want to use Tornado.

Read the rest of this entry »


Baby Steps into HBase

Today, after reading (the amazing and invaluable!) Understanding HBase and BigTable, while researching schemas for Google App Engine, I took my first tentative steps into using HBase.  About HBase:

HBase is the Hadoop database. Its (sic) an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop.

HBase’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Try it if your plans for a data store run to big.

Well, my plans don’t run to big, but they do run to indexed over time.  Since every cell in an HBase table has a timestamp, it makes it really easy to snapshot data over time, and “rollback” a query as though it was asked at any point in the past.   For data that changes rarely over time, but for which one wants a historical record, this might make querying with history much simpler.

Historical Data Example

Think about how an organization changes over time.  Employees enter and leave, business units might be bought and sold.  One approach to modeling this is to take a snapshot every day, and store that in a RDBMS.    The snapshots will have lot of  redundant information, since an org doesn’t really change very much.

A simpler model is to simply enter a new snapshot of the organization when only when it changes, essentially overwriting the previous configuration.  Since HBase automatically labels cells with timestamp, this comes for free.

Setting it up

Using Ole-Martin Mørk’s instructions was a breeze!  Even though I know almost nothing about Java and the Java environment, I managed it.  I followed them, with these modifications:

  1. After downloading, unzipping, and symbolic linking to ~hbase, I version control the whole thing ( $ git init;  git-add * ; git ci -m “initial checkin, as unpacked from source”) , so that if I foul up anything, I can easily revert!
  2. Edit ~hbase/conf/hbase-env.sh to have the right “JAVA_HOME” which for me (Debian) is  -> export JAVA_HOME=/usr/lib/jvm/java-6-openjdk

Since I don’t have passwordless ssh set up to local host, I get this error:

~/hbase$ ~/hbase/bin/start-hbase.sh
localhost: ssh: connect to host localhost port 22: Connection refused

The rest of the example seems to run fine though, and I’m in no mood to really track this down, since I’m still in the experiment phase.

Future Steps

I’m not sure whether I’m be going any deeper anytime soon, since I have a lot of SqlAlchemy code built around handling these sorts of ‘historical’ queries (where inserting and updating are the real difficulties!), but I do like the idea of easily versioned, map-like data stores quite well.


Lemon Candy and Dynamic Programming

Over at TheDailyWtf, hidden among some comments was an interesting dynamic programming problem:

Consider this problem:

George bought a sack of 100 pieces of candy at the store. 90 of the pieces are lemon flavored and ten are cherry flavored. Of the two, George prefers the lemon flavored candies.

Every day George randomly picks a piece of candy out of the bag. If it is lemon flavored, he eats it and puts the bag away for the next day.

But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor. In other words, he’ll only put a piece of candy back at most once per day.

What are the odds that when one piece of candy remains, it will be lemon flavored?

I posed the problem at a company where I used to work. All but one person tried to do it recursively. The remaining person tried to do it using an Excel spreadsheet!!!

Maybe I (and some of the other posters on that thread) are morons, but Excel (or in my case, OpenOffice) seemed like a fine way to solve it, so I did.

Read more about the Lemon-Cherry problem, and download the speadsheets used to solve it


Git-svn clone the last few revisions

It can be awfully tempting to make some changes to an existing open-source project [1]. Some of that excitement diminishes when one realizes how long a git-svn clone will take on a large project repo, like Python. The gain git-svn gives you in terms of quick history lookup is taken as cost in the beginning.

Instead, we can do a “shallow-copy” to get the last few revisions. It seems that you need to use actual revisions numbers for the first argument to -r, but I could be wrong. I tried using HEAD~1000:HEAD

$ git-svn clone http://svn.python.org/projects/python/trunk/ python-dev -r 65000:HEAD.

If you find this is *still* taking too long, try canceling, changing into the directory and issue a:

$ git svn fetch

Good luck all!

Notes

  1. Finally got my first one into python, #4568: remove limitation in varargs callback example.

Simple “object-db” using JSON and python-sqlite

As part of a much larger project, I have a group of “snapshots” of a complicated data structure.   I need to save these in a persistent way, and continue to have access to them, when needed.  My solution is to output the snapshots as JSON, and store them into a sqlite database*, where they will be persistent on disk as “jlobs” (json large objects).

This “sqlite as object-db”  has several advantages:

  1. atomic transactions,
  2. easy database replication,
  3. jlob can easily change format without affecting schema
  4. very light runtime requirements.

Building off of the sqlite3 manual, it is easy to see how to  extract the json back *out* of the database.

There are  drawbacks to this approach, of course:

  1. you’re responsible for building and maintaining tables indexing any queryable elements of your jlob, if you want to be able to access them using SQL.
  2. sql normalization purists will throw up when they look at your schema

(*Note: if you are on centos 5, and do not have access to Python 2.5, make sure that you install python-sqlite2, for example from one of these rpms) rather than updating your python-sqlite in place.  BAD THINGS WILL HAPPEN, including breaking yum. )

#!/usr/bin/python
import sys
if sys.version_info >= (2,5):
    import sqlite3
else:
    from pysqlite2 import dbapi2 as sqlite3

try:
    import json
except ImportError:
    import simplejson as json

sqlite3.register_converter("json", json.loads)

conn = sqlite3.connect(":memory:",   \
    detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
c = conn.cursor()
c.row_factory = sqlite3.Row  # fields by name
d = conn.cursor()  # normal row

json_string = json.dumps( dict(a=1,b=[1,2,3]))
conn.execute('''
    create table snapshot(
          id INTEGER PRIMARY KEY AUTOINCREMENT,
          mydata json);
    ''')
conn.execute('''
    insert into snapshot values
       (null, ?)''', (json_string,))

R1 = c.execute("select * from snapshot").fetchone()['mydata']
R2 = d.execute("select * from snapshot").fetchone()[1]
R3 = conn.execute("select * from snapshot").fetchone()[1]

assert R1==R2==R3 == {'a': 1, 'b': [1, 2, 3]}, "all should be equal"


Len() calls can be SLOW in Berkeley Database and Python bsddb.

In my day-to-day coding work, I make extensive use of Berkeley DB (bdb) hash and btree tables. They’re really fast, easy-ish to use, and work for the apps I need them for (persistent storage of json and other small data structures).

So, this python code was having all kinds of weird slowdowns for me, and it was the len() call (of all things) that was causing the issue!

As it turns out, sometimes the Berkeley database does have to iterate over all keys to give a proper answer. Even the “fast stats” *number of records* call has to

References:
Jesus Cea’s comments one why bdb’s don’t know how many keys they have
db_stat tool description
DB->stat api