Two Simple Tips to Speed up Python Time Parsing

  1. Sometimes, date parsing formatting in Python takes a long time. It can be worth writing custom datestring converters to sacrifice generality for speed.
  2. Another oddity:  setting the timezone by force can speed up code as well, like this: os.environ[‘TZ’] = ‘GMT’

Both tips are demo’d and tested in the code snipped below.

import os
import time

def _convert_date(string, year=None):
 ''' take a log string, turn it into time epoch, tuple, string

 >>> _convert_date2('Aug 19 13:45:01',2009)
 (1250689501, (2009, 8, 19, 13, 45, 1, 2, 231, 0), 'Aug 19 13:45:01')
 '''
 if year is None:  year = time.gmtime()[0]

 # was, but this profiled 4x slower
 tt = list(time.strptime("%s " % year + string, "%Y %b %d %H:%M:%S"))
 tt[-1] = 0 # turn off timezone
 tt= tuple(tt)
 ts = int(time.mktime(tt))
 return (ts,tt,string)

_months = dict(jan=1,feb=2,mar=3,apr=4,may=5,jun=6,jul=7,aug=8,sep=9,oct=10,nov=11,dec=12)
def _convert_date2(string, year=None):
 ''' take a log string, turn it into time epoch, tuple, string

 >>> _convert_date2('Aug 19 13:45:01',2009)
 (1250689501, (2009, 8, 19, 13, 45, 1, 2, 231, 0), 'Aug 19 13:45:01')
 '''
 if year is None:  year = time.gmtime()[0]

 # was, but this profiled 4x slower
 #tt = list(time.strptime("%s " % year + x, "%Y %b %d %H:%M:%S"))
 mon,d,t  = string.split()
 h,m,s = t.split(":")
 mon = _months[mon.lower()]
 tt = [year, mon,d,h,m,s,0,0,0]
 tt = tuple([int(v) for v  in tt])
 ts = int(time.mktime(tt))
 tt = time.gmtime(ts)
 return (ts,tt,string)

assert _convert_date('Aug 19 13:45:01',2009) == _convert_date2('Aug 19 13:45:01',2009)

#%timeit is an ipython macro that is like timeit.Timer with brains!

# including figuring out how many loops to run heuristically

# key fact:  a microsecond is 1000 nanoseconds

timeit _convert_date('Aug 19 13:45:01',2009)
timeit _convert_date2('Aug 19 13:45:01',2009)
os.environ['TZ'] = 'GMT'
timeit _convert_date('Aug 19 13:45:01',2009)
timeit _convert_date2('Aug 19 13:45:01',2009)

Results  (Python 2.4.3 on x64 Linux):

timeit _convert_date(‘Aug 19 13:45:01’,2009)
10000 loops, best of 3: 62 µs per loop

In [11]: timeit _convert_date2(‘Aug 19 13:45:01’,2009)
10000 loops, best of 3: 18.3 µs per loop

In [12]: os.environ[‘TZ’] = ‘GMT’

In [13]: timeit _convert_date(‘Aug 19 13:45:01’,2009)
10000 loops, best of 3: 60.2 µs per loop

In [14]: timeit _convert_date2(‘Aug 19 13:45:01’,2009)
100000 loops, best of 3: 13.3 µs per loop

The Win Factor:

  • custom parser:  300%
  • setting TZ:  20%

Feedback and additional speedup improvements welcome.

(Thanks to Jon Nelson; of the Pycurious Blog for the TZ idea)


No Geek Bulls**t Programming Class (Results so Far)

The Project

Create an accessible ‘learn to program’ class, using Python. Undo damage and barriers to access around geek culture, endemic sexism and racism, and models that say that “only certain people can program”.

Bits and Bites (at TC ExC0)

Choose Your Own Pyventure (Wikibook)

Results So Far

So far there have been two class sessions. The gender mix (self-identified) is about 50/50/0 male/female/(genderqueer, intersex) and we have 10 students or so. The self-identified goals of students included: building programs for work, changing careers, remedying previous bad programming class experiences, (rarer) learning python specifically (after knowing some other language).

Lessons Learned (and some Theories)

# Make the class accessible

  • No alpha male bulls**t
  • No pissfighting over languages, programming backgrounds, etc. remember, even experts start as newbies.
  • create safer, accessible spaces (physically accessible, make childcare credits available, advertise to underserved communities. avoid gender / sexuality assumptions, respect pronouns. Enforce safer space.)

# emotions matter in the learning experience

  • acknowledge the complexity of programming
  • programmers are made, not born
  • programming is hard to do, hard to learn
  • explain that it was hard for you to learn as well.
  • remind learners that making mistakes is how one learns to program

# Start far back. Go back further. Most students know little about how the computer works.

  • they haven’t seen / heard of / used the command line / terminal
  • they don’t know the difference between the shell and the python environment
    • they try things like ” >>> python program.py “
  • there will be mac and windows users, prepare for both
  • some learners will have programmed before, some will not

# Have a goal / main project for the course

  • connect with students.
  • build toward a full project
  • lessons should iteratively replace / improve / expand on code made during previous lessons
  • no math. Math algorithms are boring and irrelevant for most people. Python makes strings easy. Easy strings makes for easy to discuss, real-world data

# Don’t get bogged down in syntax. People don’t care. Python has awesome syntax, mostly.

  • Gloss over warts and complexities
  • Avoid jargon

# Don’t get bogged down in datatypes. Don’t mention unicode. Ignore tuples.

  • Do mention strings, “numbers” (encompassing ints and floats)
  • dictionaries before lists. Associating keys and values parallels associating variable names with values. After teaching dicts, lists are trivial.

# relate functions and data structures. They are intertwined and need to be taught in parallel.

  • Functions exist to process data structures, and data exist to feed functions

# Ignore Objects and Object-Oriented Programming

  • OO isn’t hard, but it is confusing, especially for newbies
  • More importantly, it’s *irrelevant* for most early programming tasks

# Now matters more than Complete

  • Use Wikibooks or Google Docs for ease in sharing materials. (if repeating, we might choose GDocs — Wikibooks is too much machinery)
  • Don’t worry about getting all the details right

# POWERPOINT IS DEATH