Two Simple Tips to Speed up Python Time Parsing
Posted: October 12, 2009 Filed under: modules, performance, Python Leave a comment- Sometimes, date parsing formatting in Python takes a long time. It can be worth writing custom datestring converters to sacrifice generality for speed.
- Another oddity: setting the timezone by force can speed up code as well, like this: os.environ[‘TZ’] = ‘GMT’
Both tips are demo’d and tested in the code snipped below.
import os import time def _convert_date(string, year=None): ''' take a log string, turn it into time epoch, tuple, string >>> _convert_date2('Aug 19 13:45:01',2009) (1250689501, (2009, 8, 19, 13, 45, 1, 2, 231, 0), 'Aug 19 13:45:01') ''' if year is None: year = time.gmtime()[0] # was, but this profiled 4x slower tt = list(time.strptime("%s " % year + string, "%Y %b %d %H:%M:%S")) tt[-1] = 0 # turn off timezone tt= tuple(tt) ts = int(time.mktime(tt)) return (ts,tt,string) _months = dict(jan=1,feb=2,mar=3,apr=4,may=5,jun=6,jul=7,aug=8,sep=9,oct=10,nov=11,dec=12) def _convert_date2(string, year=None): ''' take a log string, turn it into time epoch, tuple, string >>> _convert_date2('Aug 19 13:45:01',2009) (1250689501, (2009, 8, 19, 13, 45, 1, 2, 231, 0), 'Aug 19 13:45:01') ''' if year is None: year = time.gmtime()[0] # was, but this profiled 4x slower #tt = list(time.strptime("%s " % year + x, "%Y %b %d %H:%M:%S")) mon,d,t = string.split() h,m,s = t.split(":") mon = _months[mon.lower()] tt = [year, mon,d,h,m,s,0,0,0] tt = tuple([int(v) for v in tt]) ts = int(time.mktime(tt)) tt = time.gmtime(ts) return (ts,tt,string) assert _convert_date('Aug 19 13:45:01',2009) == _convert_date2('Aug 19 13:45:01',2009) #%timeit is an ipython macro that is like timeit.Timer with brains! # including figuring out how many loops to run heuristically # key fact: a microsecond is 1000 nanoseconds timeit _convert_date('Aug 19 13:45:01',2009) timeit _convert_date2('Aug 19 13:45:01',2009) os.environ['TZ'] = 'GMT' timeit _convert_date('Aug 19 13:45:01',2009) timeit _convert_date2('Aug 19 13:45:01',2009)
Results (Python 2.4.3 on x64 Linux):
timeit _convert_date(‘Aug 19 13:45:01’,2009)
10000 loops, best of 3: 62 µs per loopIn [11]: timeit _convert_date2(‘Aug 19 13:45:01’,2009)
10000 loops, best of 3: 18.3 µs per loopIn [12]: os.environ[‘TZ’] = ‘GMT’
In [13]: timeit _convert_date(‘Aug 19 13:45:01’,2009)
10000 loops, best of 3: 60.2 µs per loopIn [14]: timeit _convert_date2(‘Aug 19 13:45:01’,2009)
100000 loops, best of 3: 13.3 µs per loop
The Win Factor:
- custom parser: 300%
- setting TZ: 20%
Feedback and additional speedup improvements welcome.
(Thanks to Jon Nelson; of the Pycurious Blog for the TZ idea)