Two Simple Tips to Speed up Python Time Parsing

  1. Sometimes, date parsing formatting in Python takes a long time. It can be worth writing custom datestring converters to sacrifice generality for speed.
  2. Another oddity:  setting the timezone by force can speed up code as well, like this: os.environ[‘TZ’] = ‘GMT’

Both tips are demo’d and tested in the code snipped below.

import os
import time

def _convert_date(string, year=None):
 ''' take a log string, turn it into time epoch, tuple, string

 >>> _convert_date2('Aug 19 13:45:01',2009)
 (1250689501, (2009, 8, 19, 13, 45, 1, 2, 231, 0), 'Aug 19 13:45:01')
 '''
 if year is None:  year = time.gmtime()[0]

 # was, but this profiled 4x slower
 tt = list(time.strptime("%s " % year + string, "%Y %b %d %H:%M:%S"))
 tt[-1] = 0 # turn off timezone
 tt= tuple(tt)
 ts = int(time.mktime(tt))
 return (ts,tt,string)

_months = dict(jan=1,feb=2,mar=3,apr=4,may=5,jun=6,jul=7,aug=8,sep=9,oct=10,nov=11,dec=12)
def _convert_date2(string, year=None):
 ''' take a log string, turn it into time epoch, tuple, string

 >>> _convert_date2('Aug 19 13:45:01',2009)
 (1250689501, (2009, 8, 19, 13, 45, 1, 2, 231, 0), 'Aug 19 13:45:01')
 '''
 if year is None:  year = time.gmtime()[0]

 # was, but this profiled 4x slower
 #tt = list(time.strptime("%s " % year + x, "%Y %b %d %H:%M:%S"))
 mon,d,t  = string.split()
 h,m,s = t.split(":")
 mon = _months[mon.lower()]
 tt = [year, mon,d,h,m,s,0,0,0]
 tt = tuple([int(v) for v  in tt])
 ts = int(time.mktime(tt))
 tt = time.gmtime(ts)
 return (ts,tt,string)

assert _convert_date('Aug 19 13:45:01',2009) == _convert_date2('Aug 19 13:45:01',2009)

#%timeit is an ipython macro that is like timeit.Timer with brains!

# including figuring out how many loops to run heuristically

# key fact:  a microsecond is 1000 nanoseconds

timeit _convert_date('Aug 19 13:45:01',2009)
timeit _convert_date2('Aug 19 13:45:01',2009)
os.environ['TZ'] = 'GMT'
timeit _convert_date('Aug 19 13:45:01',2009)
timeit _convert_date2('Aug 19 13:45:01',2009)

Results  (Python 2.4.3 on x64 Linux):

timeit _convert_date(‘Aug 19 13:45:01’,2009)
10000 loops, best of 3: 62 µs per loop

In [11]: timeit _convert_date2(‘Aug 19 13:45:01’,2009)
10000 loops, best of 3: 18.3 µs per loop

In [12]: os.environ[‘TZ’] = ‘GMT’

In [13]: timeit _convert_date(‘Aug 19 13:45:01’,2009)
10000 loops, best of 3: 60.2 µs per loop

In [14]: timeit _convert_date2(‘Aug 19 13:45:01’,2009)
100000 loops, best of 3: 13.3 µs per loop

The Win Factor:

  • custom parser:  300%
  • setting TZ:  20%

Feedback and additional speedup improvements welcome.

(Thanks to Jon Nelson; of the Pycurious Blog for the TZ idea)



Leave a comment