Quick and Dirty JSON Speed Testing in Python
September 11, 2008
[See updated post for analysis using timeit]
As per Poromenos‘ request on Reddit, I decided to do a bit of expansion on my cryptic comment about the major json packages in python (simplejson, cjson, demjson):
My conclusion: use demjson if you really really want to make sure everything is right, and you don’t care at all about time. Use simplejson if you’re in the 99% of all users who want reasonable performance over a broad range of objects, and use enhanced cjson 1.0.3x if you in the came with reasonable json inputs, and you need much faster (10x) speed…. that is, if the json step is the bottleneck.
More worrisome — demjson didn’t handle the unicode string I threw at it properly…
The Test Setup
Python 2.4.3 on 64-bit Linux
simplejson: 1.9.2, with c-extensions turned on.*
demjson: 1.3
cjson: enhanced cjson 1.0.3×6 from http://python.cx.hu/python-cjson/
(We need the enhanced version for the simplified “encode using utf-8″ interface.)
* To test for simplejson c-extensions:
> assert getattr(simplejson, ‘_speedups’, None), “no speedups enabled”
Test Code
## simple json testing
import simplejson
import cjson
import demjson
class A(object):
def __init__(self):
self.var1 = 1
self.var2 = dict(a=1,b=2,c=3)
## TEST DATA
set_ = set([1,2,3,4])
nested_dict = dict(v1="a", v2="b", v3=dict(n1=1,n2=2,n3=3))
ustring = u"a string with some unicod Andre\202"
#In case anyone is wondering, unicod is a text-encoding used by Nova Scotian fishermen.
class_= A()
## Dump and load methods
dumps = {
"simplejson": simplejson.dumps,
"cjson": lambda x: cjson.encode(x,encoding="utf-8"),
"demjson": demjson.encode
}
loads = {
"simplejson": simplejson.loads,
"cjson": lambda x: cjson.decode(x,encoding="utf-8"),
"demjson": demjson.decode
}
## Can the functions handle different data types
for thing_name in ("set_", "nested_dict", "ustring", "class_", ):
thing = eval(thing_name)
for k,fun in dumps.iteritems():
try:
out = fun(thing)
print "SUCCESS: %s enocdes %s" % (k,thing_name)
print out
except Exception, e:
print "ERROR: %s failed to enocde %s" % (k,thing_name)
print "ERROR:", e
## Profiling code
from profile import run
for thing_name in ("nested_dict", "ustring", ):
thing = eval(thing_name)
for k,fun in dumps.iteritems():
print k, thing_name
run("for ii in xrange(10000): fun(thing)")
Capability Results
All handled the nested dict fine, demjson was most compact, as advertised.
demjson improperly encoded the unicode string (any ideas why anyone?)
demjson encoded the set as a list, all others failed to encode it.
All failed to encode the class instance, as expected.
ERROR: cjson failed to enocde set_
ERROR: object is not JSON encodable
SUCCESS: demjson enocdes set_
[1,2,3,4]
ERROR: simplejson failed to enocde set_
ERROR: set([1, 2, 3, 4]) is not JSON serializable
SUCCESS: cjson enocdes nested_dict
{"v1": "a", "v2": "b", "v3": {"n1": 1, "n2": 2, "n3": 3}}
SUCCESS: demjson enocdes nested_dict
{"v1":"a","v2":"b","v3":{"n1":1,"n2":2,"n3":3}}
SUCCESS: simplejson enocdes nested_dict
{"v1": "a", "v2": "b", "v3": {"n1": 1, "n2": 2, "n3": 3}}
SUCCESS: cjson enocdes ustring
"a string with some unicod Andre\u0082"
SUCCESS: demjson enocdes ustring
"a string with some unicod Andre"
SUCCESS: simplejson enocdes ustring
"a string with some unicod Andre\u0082"
ERROR: cjson failed to enocde class_
ERROR: object is not JSON encodable
ERROR: demjson failed to enocde class_
ERROR: ('can not encode object into a JSON representation', <__main__.A object at 0x2aaaae9e2b90>)
ERROR: simplejson failed to enocde class_
ERROR: <__main__.A object at 0x2aaaae9e2b90> is not JSON serializable
Timing Results
cjson is really really fast, especially for the nested dict. In other (unpublished) experiments, it’s even pretty close to the loading / unloading speeds of python binary load/dump using cPickle.
cjson nested_dict
10003 function calls in 0.190 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 :0(setprofile)
10000 0.150 0.000 0.150 0.000 :3()
1 0.030 0.030 0.180 0.180 :1(?)
1 0.010 0.010 0.190 0.190 profile:0(for ii in xrange(10000): fun(thing))
0 0.000 0.000 profile:0(profiler)
demjson nested_dict
4290003 function calls (4160003 primitive calls) in 30.770 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
550000 2.060 0.000 2.060 0.000 :0(append)
10000 0.050 0.000 0.050 0.000 :0(callable)
960000 2.870 0.000 2.870 0.000 :0(chr)
60000 0.180 0.000 0.180 0.000 :0(extend)
960000 2.640 0.000 2.640 0.000 :0(has_key)
170000 0.740 0.000 0.740 0.000 :0(hasattr)
820000 2.790 0.000 2.790 0.000 :0(isinstance)
20000 0.080 0.000 0.080 0.000 :0(iterkeys)
90000 0.550 0.000 0.550 0.000 :0(join)
80000 0.230 0.000 0.230 0.000 :0(len)
140000 0.540 0.000 0.540 0.000 :0(ord)
10000 0.090 0.000 0.090 0.000 :0(range)
1 0.000 0.000 0.000 0.000 :0(setprofile)
20000 0.120 0.000 0.120 0.000 :0(sort)
1 0.130 0.130 30.770 30.770 :1(?)
30000 0.350 0.000 0.470 0.000 demjson.py:1220(encode_number)
80000 3.500 0.000 6.630 0.000 demjson.py:1378(encode_string)
10000 0.070 0.000 17.860 0.002 demjson.py:1714(encode)
130000/10000 3.210 0.000 17.750 0.002 demjson.py:1737(encode_helper)
20000/10000 2.390 0.000 16.910 0.002 demjson.py:1761(encode_composite)
10000 0.310 0.000 30.640 0.003 demjson.py:1896(encode)
20000 0.300 0.000 0.590 0.000 demjson.py:521(extend_and_flatten_list_with_sep)
80000 0.750 0.000 1.130 0.000 demjson.py:730(isstringtype)
10000 6.720 0.001 12.420 0.001 demjson.py:863(__init__)
10000 0.100 0.000 0.100 0.000 demjson.py:910(_set_strictness)
1 0.000 0.000 30.770 30.770 profile:0(for ii in xrange(10000): fun(thing))
0 0.000 0.000 profile:0(profiler)
simplejson nested_dict
1330003 function calls (950003 primitive calls) in 9.160 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
80000 0.370 0.000 0.370 0.000 :0(encode_basestring_ascii)
20000 0.100 0.000 0.100 0.000 :0(id)
270000 0.850 0.000 0.850 0.000 :0(isinstance)
20000 0.070 0.000 0.070 0.000 :0(iteritems)
10000 0.020 0.000 0.020 0.000 :0(join)
1 0.000 0.000 0.000 0.000 :0(setprofile)
1 0.030 0.030 9.160 9.160 :1(?)
10000 0.110 0.000 9.130 0.001 __init__.py:190(dumps)
400000/260000 3.050 0.000 6.220 0.000 encoder.py:212(_iterencode_dict)
500000/260000 3.690 0.000 8.100 0.000 encoder.py:283(_iterencode)
10000 0.860 0.000 9.020 0.001 encoder.py:345(encode)
10000 0.010 0.000 0.010 0.000 encoder.py:369(iterencode)
1 0.000 0.000 9.160 9.160 profile:0(for ii in xrange(10000): fun(thing))
0 0.000 0.000 profile:0(profiler)
cjson ustring
10003 function calls in 0.090 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 :0(setprofile)
10000 0.030 0.000 0.030 0.000 :3()
1 0.060 0.060 0.090 0.090 :1(?)
1 0.000 0.000 0.090 0.090 profile:0(for ii in xrange(10000): fun(thing))
0 0.000 0.000 profile:0(profiler)
demjson ustring
2500003 function calls in 16.090 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
50000 0.130 0.000 0.130 0.000 :0(append)
10000 0.020 0.000 0.020 0.000 :0(callable)
960000 2.650 0.000 2.650 0.000 :0(chr)
970000 2.390 0.000 2.390 0.000 :0(has_key)
10000 0.010 0.000 0.010 0.000 :0(hasattr)
70000 0.300 0.000 0.300 0.000 :0(isinstance)
20000 0.120 0.000 0.120 0.000 :0(join)
10000 0.040 0.000 0.040 0.000 :0(len)
330000 0.980 0.000 0.980 0.000 :0(ord)
10000 0.080 0.000 0.080 0.000 :0(range)
1 0.000 0.000 0.000 0.000 :0(setprofile)
1 0.090 0.090 16.090 16.090 :1(?)
10000 2.060 0.000 3.490 0.000 demjson.py:1378(encode_string)
10000 0.070 0.000 3.980 0.000 demjson.py:1714(encode)
10000 0.250 0.000 3.870 0.000 demjson.py:1737(encode_helper)
10000 0.220 0.000 16.000 0.002 demjson.py:1896(encode)
10000 6.610 0.001 11.780 0.001 demjson.py:863(__init__)
10000 0.070 0.000 0.070 0.000 demjson.py:910(_set_strictness)
1 0.000 0.000 16.090 16.090 profile:0(for ii in xrange(10000): fun(thing))
0 0.000 0.000 profile:0(profiler)
simplejson ustring
50003 function calls in 0.310 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
10000 0.050 0.000 0.050 0.000 :0(encode_basestring_ascii)
20000 0.070 0.000 0.070 0.000 :0(isinstance)
1 0.000 0.000 0.000 0.000 :0(setprofile)
1 0.040 0.040 0.310 0.310 :1(?)
10000 0.090 0.000 0.270 0.000 __init__.py:190(dumps)
10000 0.060 0.000 0.180 0.000 encoder.py:345(encode)
1 0.000 0.000 0.310 0.310 profile:0(for ii in xrange(10000): fun(thing))
0 0.000 0.000 profile:0(profiler)
Final Conclusion
I also think simplejson is pretty rad… In the applications I use it for, json conversion is one of the bottlenecks (rather than network timing for example), and I have very well-defined / trusted input / output, so I use cjson-enhanced.