Quick and (less?)-Dirty JSON Speed Testing in Python
September 12, 2008
Back in a previous article, I made some bold claims. After a good vetting on Reddit, the incomparable effbot pointed me toward timeit (cf: notes).
Quick, dirty, and quite possibly deeply flawed.
The profiler’s designed for profiling, not benchmarking, and Python code running under the profiler runs a lot slower than usual — but C code isn’t affected at all.
To get proper results, use the timeit module instead.
So, here is a revised analysis. It still looks like cjson strongly outperforms the others.* Most interestingly, I tried oblivion95’s suggestion to read in json using eval, and that seems slower than cjson, which seems implausible to me. I look forward to corrections.
Results
dumping to JSON
cjson dump nested_dict 0.096393 0.096989 0.097203 0.097859 0.098357 demjson dump nested_dict 4.589573 4.601798 4.609123 4.621567 4.625506 simplejson dump nested_dict 0.595901 0.596267 0.596555 0.597104 0.597633 cjson dump ustring 0.024242 0.024264 0.024453 0.024475 0.024548 demjson dump ustring 2.350742 2.363112 2.364416 2.365360 2.374244 simplejson dump ustring 0.039637 0.039668 0.039820 0.039890 0.039976
loading from JSON
cjson load nested_dict_json 0.042304 0.042332 0.042936 0.043246 0.043858 demjson load nested_dict_json 8.317319 8.332928 8.334701 8.367242 8.371535 simplejson load nested_dict_json 1.858826 1.862957 1.864221 1.864268 1.868705 eval load nested_dict_json 0.484512 0.485497 0.487538 0.487866 0.488751 cjson load ustring_json 0.045566 0.045803 0.045846 0.046027 0.046056 demjson load ustring_json 3.391110 3.401287 3.403575 3.408148 3.416667 simplejson load ustring_json 0.243784 0.244193 0.244920 0.245126 0.246061 eval load ustring_json 0.121635 0.121801 0.122561 0.123064 0.123563
## simple json testing
import simplejson
import cjson
import demjson
class A(object):
def __init__(self):
self.var1 = 1
self.var2 = dict(a=1,b=2,c=3)
## TEST DATA
set_ = set([1,2,3,4])
nested_dict = dict(v1="a", v2="b", v3=dict(n1=1,n2=2,n3=3))
ustring = u"a string with some unicod Andre\202"
#In case anyone is wondering, unicod is a text-encoding used by Nova Scotian fishermen.
class_= A()
## Dump and load methods
dumps = {
"simplejson": simplejson.dumps,
"cjson": lambda x: cjson.encode(x,encoding="utf-8"),
"demjson": demjson.encode
}
loads = {
"eval": eval,
"simplejson": simplejson.loads,
"cjson": lambda x: cjson.decode(x,encoding="utf-8"),
"demjson": demjson.decode
}
## Can the functions handle different data types
for thing_name in ("set_", "nested_dict", "ustring", "class_", ):
thing = eval(thing_name)
for k,fun in dumps.iteritems():
try:
out = fun(thing)
print "SUCCESS: %s enocdes %s" % (k,thing_name)
print out
except Exception, e:
print "ERROR: %s failed to enocde %s" % (k,thing_name)
print "ERROR:", e
## Profiling code
#from profile import run
from timeit import Timer
for thing_name in ("nested_dict", "ustring", ):
thing = eval(thing_name)
# time our various jsons
for k, fun in dumps.iteritems():
print k, "dump", thing_name
T = Timer("""fun(thing)""", "from __main__ import fun,thing")
print " ".join(["%2.6f" % x for x in sorted(T.repeat(5, 10000))])
#run("for ii in xrange(10000): fun(thing)")
nested_dict_json = simplejson.dumps(nested_dict)
ustring_json = simplejson.dumps(ustring)
for thing_name in ("nested_dict_json", "ustring_json"):
thing = eval(thing_name)
# time our various jsons
for k, fun in loads.iteritems():
print k, "load", thing_name
T = Timer("""fun(thing)""", "from __main__ import fun,thing")
print " ".join(["%2.6f" % x for x in sorted(T.repeat(5, 10000))])
#run("for ii in xrange(10000): fun(thing)")
FOOTNOTES:
* Whew! I really dodged a bullet this time! I thought I was going to have a lot of egg on my face about this. Some days, one gets lucky right?
September 14, 2008 at 11:54 am
“I really dodged a bullet this time!”
Well, I guess I should have mentioned that the C module was fast enough to win anyway. But if you look at the totals, the others are not quite as slow as the profiler results might have led you to believe.
As for why the eval function is slower, that’s because it uses Python’s standard compiler, which means that it builds a parse tree, generates byte code, and then executes the byte code to get the result. A dedicated interpreter (which is what cjson really is) can simply build an object tree instead of a parse tree, and skip the other steps.
September 14, 2008 at 7:14 pm
Thanks for clarifying the mechanism for why eval is slower! I don’t know the core cpython implementation well enough to answer questions like that. Also, it was good to finally learn timeit.
September 19, 2008 at 1:53 am
Any chance you could run your benchmark against jsonlib[1] as well? In my experience it’s almost as fast as cjson, with the added benefit of actually working[2].
[1] http://pypi.python.org/pypi/jsonlib/
[2] cjson.decode (‘["\/"]‘); cjson.encode ([u'\U0001D11E'])