1. Sometimes, date parsing formatting in Python takes a long time. It can be worth writing custom datestring converters to sacrifice generality for speed.
  2. Another oddity:  setting the timezone by force can speed up code as well, like this: os.environ['TZ'] = ‘GMT’

Both tips are demo’d and tested in the code snipped below.

import os
import time

def _convert_date(string, year=None):
 ''' take a log string, turn it into time epoch, tuple, string

 >>> _convert_date2('Aug 19 13:45:01',2009)
 (1250689501, (2009, 8, 19, 13, 45, 1, 2, 231, 0), 'Aug 19 13:45:01')
 '''
 if year is None:  year = time.gmtime()[0]

 # was, but this profiled 4x slower
 tt = list(time.strptime("%s " % year + string, "%Y %b %d %H:%M:%S"))
 tt[-1] = 0 # turn off timezone
 tt= tuple(tt)
 ts = int(time.mktime(tt))
 return (ts,tt,string)

_months = dict(jan=1,feb=2,mar=3,apr=4,may=5,jun=6,jul=7,aug=8,sep=9,oct=10,nov=11,dec=12)
def _convert_date2(string, year=None):
 ''' take a log string, turn it into time epoch, tuple, string

 >>> _convert_date2('Aug 19 13:45:01',2009)
 (1250689501, (2009, 8, 19, 13, 45, 1, 2, 231, 0), 'Aug 19 13:45:01')
 '''
 if year is None:  year = time.gmtime()[0]

 # was, but this profiled 4x slower
 #tt = list(time.strptime("%s " % year + x, "%Y %b %d %H:%M:%S"))
 mon,d,t  = string.split()
 h,m,s = t.split(":")
 mon = _months[mon.lower()]
 tt = [year, mon,d,h,m,s,0,0,0]
 tt = tuple([int(v) for v  in tt])
 ts = int(time.mktime(tt))
 tt = time.gmtime(ts)
 return (ts,tt,string)

assert _convert_date('Aug 19 13:45:01',2009) == _convert_date2('Aug 19 13:45:01',2009)

#%timeit is an ipython macro that is like timeit.Timer with brains!

# including figuring out how many loops to run heuristically

# key fact:  a microsecond is 1000 nanoseconds

timeit _convert_date('Aug 19 13:45:01',2009)
timeit _convert_date2('Aug 19 13:45:01',2009)
os.environ['TZ'] = 'GMT'
timeit _convert_date('Aug 19 13:45:01',2009)
timeit _convert_date2('Aug 19 13:45:01',2009)

Results  (Python 2.4.3 on x64 Linux):

timeit _convert_date(‘Aug 19 13:45:01′,2009)
10000 loops, best of 3: 62 µs per loop

In [11]: timeit _convert_date2(‘Aug 19 13:45:01′,2009)
10000 loops, best of 3: 18.3 µs per loop

In [12]: os.environ['TZ'] = ‘GMT’

In [13]: timeit _convert_date(‘Aug 19 13:45:01′,2009)
10000 loops, best of 3: 60.2 µs per loop

In [14]: timeit _convert_date2(‘Aug 19 13:45:01′,2009)
100000 loops, best of 3: 13.3 µs per loop

The Win Factor:

  • custom parser:  300%
  • setting TZ:  20%

Feedback and additional speedup improvements welcome.

(Thanks to Jon Nelson; of the Pycurious Blog for the TZ idea)

The Project

Create an accessible ‘learn to program’ class, using Python. Undo damage and barriers to access around geek culture, endemic sexism and racism, and models that say that “only certain people can program”.

Results So Far

So far there have been two class sessions. The gender mix (self-identified) is about 50/50/0 male/female/(genderqueer, intersex) and we have 10 students or so. The self-identified goals of students included: building programs for work, changing careers, remedying previous bad programming class experiences, (rarer) learning python specifically (after knowing some other language).

Lessons Learned (and some Theories)

# Make the class accessible

  • No alpha male bulls**t
  • No pissfighting over languages, programming backgrounds, etc. remember, even experts start as newbies.
  • create safer, accessible spaces (physically accessible, make childcare credits available, advertise to underserved communities. avoid gender / sexuality assumptions, respect pronouns. Enforce safer space.)

# emotions matter in the learning experience

  • acknowledge the complexity of programming
  • programmers are made, not born
  • programming is hard to do, hard to learn
  • explain that it was hard for you to learn as well.
  • remind learners that making mistakes is how one learns to program

# Start far back. Go back further. Most students know little about how the computer works.

  • they haven’t seen / heard of / used the command line / terminal
  • they don’t know the difference between the shell and the python environment
    • they try things like ” >>> python program.py “
  • there will be mac and windows users, prepare for both
  • some learners will have programmed before, some will not

# Have a goal / main project for the course

  • connect with students.
  • build toward a full project
  • lessons should iteratively replace / improve / expand on code made during previous lessons
  • no math. Math algorithms are boring and irrelevant for most people. Python makes strings easy. Easy strings makes for easy to discuss, real-world data

# Don’t get bogged down in syntax. People don’t care. Python has awesome syntax, mostly.

  • Gloss over warts and complexities
  • Avoid jargon

# Don’t get bogged down in datatypes. Don’t mention unicode. Ignore tuples.

  • Do mention strings, “numbers” (encompassing ints and floats)
  • dictionaries before lists. Associating keys and values parallels associating variable names with values. After teaching dicts, lists are trivial.

# relate functions and data structures. They are intertwined and need to be taught in parallel.

  • Functions exist to process data structures, and data exist to feed functions

# Ignore Objects and Object-Oriented Programming

  • OO isn’t hard, but it is confusing, especially for newbies
  • More importantly, it’s *irrelevant* for most early programming tasks

# Now matters more than Complete

  • Use Wikibooks or Google Docs for ease in sharing materials. (if repeating, we might choose GDocs — Wikibooks is too much machinery)
  • Don’t worry about getting all the details right

# POWERPOINT IS DEATH

After reading Kirrily Roberts’ OSCON Keynote, and links from there to
GeekFeminism (a via Lindsey Kuper), I’ve been riled up about barriers to access in the programming community. I come from a non-traditional programming background (more on that in later journals), and had a lot of baggage about the mystique of programming. So, a friend and I decided to do something about it, and have some free classes for non-traditional programmers through the Twin Cities Experimental College.


Bits and Bites — Programming First Steps

Read the rest of this entry »

Fresh on the heels of Tornado’s release, and Glyph’s response to it (note 1) and others, I’ve been thinking about why Tornado so excites me.

Twisted is a robust, powerful, scalable asynchronous web framework (among other things). We have used it successfully in the past. Taking them at their word, Tornado is scalable, but focused on http and much less fully featured than Twisted, it does provide authentication pieces (awesome!), and some other utilities.  In architectural terms, Glyph is probably right that Tornado is incomplete (to be polite).

I still want to use Tornado.

Read the rest of this entry »

I kept getting this sort of error from createlang (PG 8.1 on Centos 4 — from when dinosaurs walked).  I tried this:

$ sudo yum install postgresql-python.x86_64

But this wasn’t enough to get createlang going.

$ sudo -u postgres createlang plpythonu mydb
Password:
createlang: language installation failed: ERROR:  could not access file "$libdir/plpython": No such file or directory

It turns out that there is a non-obvious dependency:

$ sudo yum install postgresql-python.x86_64 postgresql-pl.x86_64

$ sudo -u postgres createlang --echo plpythonu test3
SELECT oid FROM pg_catalog.pg_language WHERE lanname = 'plpythonu';
CREATE LANGUAGE "plpythonu";

Thus, postgresql-pl.x86_64 is a sooper sekrit dependency.

Good luck!

(ps.:  createlang --echo is useful)

The Right Way™ to make database access read-only is to create a read-only user.   So, why not just Do It Right™ ?

  1. Not all databases (eyes at you Sqlite!) support these fancy “users”
  2. Sometimes creating this second user (and changing configuration files during program invocation) is overkill, or a hassle.
  3. During development, it’s nice to be able to temporarily make a database read-only, or read-only from a particular session.
  4. I am very lazy.

In SQLAlchemy, there is a simple solutionmonkeypatch the session.flush method.

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

## Based on code by Yannick Gingras

def abort_ro(*args,**kwargs):
    ''' the terrible consequences for trying
        to flush to the db '''
    print "No writing allowed, tsk!  We're telling mom!"
    return  

def db_setup(connstring='sqlite:///:memory:',
               echo=False, readonly=True):
    engine = create_engine(connstring, echo=echo)
    Session = sessionmaker(bind=engine, autoflush=False, autocommit=False)
    session = Session()
    if readonly:
        session.flush = abort_ro   # now it won't flush!

    return session, engine

Session objects are still writable within the session, and this functionality can now be enforced at the session level.

I was having some trouble installing a network card on my home server.  It wasn’t being autodetected.  I’m weak on Linux hardware  stuff, and networking in particular (since it’s always *just worked*), so this was starting from scratch.  I had a few extra handicaps, in that I didn’t have the orginal box, and this card was one I’d bought months ago, physically installed, and forgetten to configure, so my bad!  I didn’t even know the model number!
read on to see how I got it working

“Being this easy ain’t cheap.”  “There’s no such thing as a free lunch.”

We’ve all heard these tropes before, right?  Sometimes, without testing, it’s hard to see exactly how much that lunch costs.  This week’s example:  Python’s copy.deepcopy.

I tend to fancy myself as using a lot of functional programming techniques in my code, and as part of that, I try to avoid modifying data by side-effect.  Deepcopy makes it easy to copy the original structure, modify the copy, and return it.  After some profiling and timing work, I saw that, of all things, deepcopy was the bottleneck!

Sure, it’s bulletproof, battle-tested, and designed to do the Right Thing ™ in almost every case!  But for simple data structures, it can be overkill, since it does so much accounting, reference tracking, and the like.

Most of the data I see in my day job has simple formats: mainly dictionaries of lists, sets, strings, tuples, and integers. — the basic python types we know and love, easily representable (in plain text, html, tables), and easy to munge / transmit (using JSON or the like).  In short, they’re nice to work with, and transparent.

As it turns out, when we control the input data, we don’t need to worry as much about robustness.  Sure the code below for “deepish_copy” doesn’t handle classes, and nested iterables, or generators, or even nesting to arbitrary depth.  But, it runs fast, as the speed results below show.

import timeit
from copy import deepcopy

def deepish_copy(org):
    '''
    much, much faster than deepcopy, for a dict of the simple python types.
    '''
    out = dict().fromkeys(org)
    for k,v in org.iteritems():
        try:
            out[k] = v.copy()   # dicts, sets
        except AttributeError:
            try:
                out[k] = v[:]   # lists, tuples, strings, unicode
            except TypeError:
                out[k] = v      # ints

    return out

def test_deepish_copy():
    o1 = dict(name = u"blah", id=1, att0 = (1,2,3), att1 = range(10), att2 = set(range(10)))
    o2 = deepish_copy(o1)
    assert o2 == o1, "not equal, but should be"
    del o2['att1'][-1]
    assert o2 != o1, "are equal, shouldn't be"

#prun for ii in xrange(1000):  o2 = deepcopy(o1)
#prun for ii in xrange(1000):  o2 = dc2(o1)

o1 = dict(name = u"blah", id=1, att0 = (1,2,3), att1 = range(10), att2 = set(range(10)))

a = timeit.Timer("o2 = deepish_copy(o1)","from __main__ import deepish_copy,o1")
b = timeit.Timer("o2 = deepcopy(o1)","from __main__ import deepcopy,o1")

# 64-bit linux, 1 gHz chip, python 2.4.3
a.repeat(3,number=20000)
# [0.45441699028015137, 0.41893100738525391, 0.46757102012634277]
b.repeat(3,number=20000)
# [2.5441901683807373, 2.5316669940948486, 2.4751369953155518]

Using the custom written code speeds things up quite a bit (5 fold!).  For me, where this copying *was* the bottleneck, and I have to iterate over hundreds of thousands of these things, it made a noticible difference in total run time.  Taking the 10 minutes it took to write this code was worth it.   So was profiling (using ipython’s simple %prun macro).

As always, to end with another cliche:  your mileage may vary… but if you’re not relying on the car manufacturers to degisn an engine for exactly your needs,  you can probably improve it.

Some simple data is surprisingly hard to find.  Case in point: for some mapping projects, I wanted an adjacency list of the US states.  I couldn’t find one easily, so I made one.   Spread it far and wide!  This should be a pretty easy to digest form.  Figure out what states are next to each other (neighbors) with ease!

Potential gotchas:

  1. I say the four corners (UT,CO,NM,AZ) all touch. If you don’t like it, take them out.
  2. DC is a state here, but Guam, USVI and others aren’t.
  3. It’ll be a cold day in hell before I recognize Missouri.

# Author Gregg Lind
# License:  Public Domain.    I would love to hear about any projects you use if it for though!

AK
AL,MS,TN,GA,FL
AR,MO,TN,MS,LA,TX,OK
AZ,CA,NV,UT,CO,NM
CA,OR,NV,AZ
CO,WY,NE,KS,OK,NM,AZ,UT
CT,NY,MA,RI
DC,MD,VA
DE,MD,PA,NJ
FL,AL,GA
GA,FL,AL,TN,NC,SC
HI
IA,MN,WI,IL,MO,NE,SD
ID,MT,WY,UT,NV,OR,WA
IL,IN,KY,MO,IA,WI
IN,MI,OH,KY,IL
KS,NE,MO,OK,CO
KY,IN,OH,WV,VA,TN,MO,IL
LA,TX,AR,MS
MA,RI,CT,NY,NH,VT
MD,VA,WV,PA,DC,DE
ME,NH
MI,WI,IN,OH
MN,WI,IA,SD,ND
MO,IA,IL,KY,TN,AR,OK,KS,NE
MS,LA,AR,TN,AL
MT,ND,SD,WY,ID
NC,VA,TN,GA,SC
ND,MN,SD,MT
NE,SD,IA,MO,KS,CO,WY
NH,VT,ME,MA
NJ,DE,PA,NY
NM,AZ,UT,CO,OK,TX
NV,ID,UT,AZ,CA,OR
NY,NJ,PA,VT,MA,CT
OH,PA,WV,KY,IN,MI
OK,KS,MO,AR,TX,NM,CO
OR,CA,NV,ID,WA
PA,NY,NJ,DE,MD,WV,OH
RI,CT,MA
SC,GA,NC
SD,ND,MN,IA,NE,WY,MT
TN,KY,VA,NC,GA,AL,MS,AR,MO
TX,NM,OK,AR,LA
UT,ID,WY,CO,NM,AZ,NV
VA,NC,TN,KY,WV,MD,DC
VT,NY,NH,MA
WA,ID,OR
WI,MI,MN,IA,IL
WV,OH,PA,MD,VA,KY
WY,MT,SD,NE,CO,UT,ID

In some of our server code, we like to insure we get unbuffered output, like in perl.  In Python, this is easy to do:


import sys

sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)

However, using the wonderful nosetest for testing will barf on this code, because it reassigns stdout to a cStringIO for capturing.

This is a workaround:


import sys
import os

try:   # get unbuffered output
    sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
except AttributeError, exc: # under nose, sys.stdout is reassigned to a string buffer
    pass

def test():
    assert 1

Alternately, run nosetests -s, which disables the output capture feature.

Cf:  my embarrassing bug report over at nose