Two Simple Tips to Speed up Python Time Parsing
October 12, 2009
- Sometimes, date parsing formatting in Python takes a long time. It can be worth writing custom datestring converters to sacrifice generality for speed.
- Another oddity: setting the timezone by force can speed up code as well, like this: os.environ['TZ'] = ‘GMT’
Both tips are demo’d and tested in the code snipped below.
import os
import time
def _convert_date(string, year=None):
''' take a log string, turn it into time epoch, tuple, string
>>> _convert_date2('Aug 19 13:45:01',2009)
(1250689501, (2009, 8, 19, 13, 45, 1, 2, 231, 0), 'Aug 19 13:45:01')
'''
if year is None: year = time.gmtime()[0]
# was, but this profiled 4x slower
tt = list(time.strptime("%s " % year + string, "%Y %b %d %H:%M:%S"))
tt[-1] = 0 # turn off timezone
tt= tuple(tt)
ts = int(time.mktime(tt))
return (ts,tt,string)
_months = dict(jan=1,feb=2,mar=3,apr=4,may=5,jun=6,jul=7,aug=8,sep=9,oct=10,nov=11,dec=12)
def _convert_date2(string, year=None):
''' take a log string, turn it into time epoch, tuple, string
>>> _convert_date2('Aug 19 13:45:01',2009)
(1250689501, (2009, 8, 19, 13, 45, 1, 2, 231, 0), 'Aug 19 13:45:01')
'''
if year is None: year = time.gmtime()[0]
# was, but this profiled 4x slower
#tt = list(time.strptime("%s " % year + x, "%Y %b %d %H:%M:%S"))
mon,d,t = string.split()
h,m,s = t.split(":")
mon = _months[mon.lower()]
tt = [year, mon,d,h,m,s,0,0,0]
tt = tuple([int(v) for v in tt])
ts = int(time.mktime(tt))
tt = time.gmtime(ts)
return (ts,tt,string)
assert _convert_date('Aug 19 13:45:01',2009) == _convert_date2('Aug 19 13:45:01',2009)
#%timeit is an ipython macro that is like timeit.Timer with brains!
# including figuring out how many loops to run heuristically
# key fact: a microsecond is 1000 nanoseconds
timeit _convert_date('Aug 19 13:45:01',2009)
timeit _convert_date2('Aug 19 13:45:01',2009)
os.environ['TZ'] = 'GMT'
timeit _convert_date('Aug 19 13:45:01',2009)
timeit _convert_date2('Aug 19 13:45:01',2009)
Results (Python 2.4.3 on x64 Linux):
timeit _convert_date(‘Aug 19 13:45:01′,2009)
10000 loops, best of 3: 62 µs per loopIn [11]: timeit _convert_date2(‘Aug 19 13:45:01′,2009)
10000 loops, best of 3: 18.3 µs per loopIn [12]: os.environ['TZ'] = ‘GMT’
In [13]: timeit _convert_date(‘Aug 19 13:45:01′,2009)
10000 loops, best of 3: 60.2 µs per loopIn [14]: timeit _convert_date2(‘Aug 19 13:45:01′,2009)
100000 loops, best of 3: 13.3 µs per loop
The Win Factor:
- custom parser: 300%
- setting TZ: 20%
Feedback and additional speedup improvements welcome.
(Thanks to Jon Nelson; of the Pycurious Blog for the TZ idea)