Quick and (less?)-Dirty JSON Speed Testing in Python

Back in a previous article, I made some bold claims.   After a good vetting on Reddit, the incomparable effbot pointed me toward timeit (cf: notes).

Quick, dirty, and quite possibly deeply flawed.

The profiler’s designed for profiling, not benchmarking, and Python code running under the profiler runs a lot slower than usual — but C code isn’t affected at all.

To get proper results, use the timeit module instead.

So, here is a revised analysis.  It still looks like cjson strongly outperforms the others.*  Most interestingly, I tried oblivion95’s suggestion to read in json using eval, and that seems slower than cjson, which seems implausible to me.  I look forward to corrections.

Results

dumping to JSON

cjson dump nested_dict
0.096393 0.096989 0.097203 0.097859 0.098357
demjson dump nested_dict
4.589573 4.601798 4.609123 4.621567 4.625506
simplejson dump nested_dict
0.595901 0.596267 0.596555 0.597104 0.597633

cjson dump ustring
0.024242 0.024264 0.024453 0.024475 0.024548
demjson dump ustring
2.350742 2.363112 2.364416 2.365360 2.374244
simplejson dump ustring
0.039637 0.039668 0.039820 0.039890 0.039976

loading from JSON

cjson load nested_dict_json
0.042304 0.042332 0.042936 0.043246 0.043858
demjson load nested_dict_json
8.317319 8.332928 8.334701 8.367242 8.371535
simplejson load nested_dict_json
1.858826 1.862957 1.864221 1.864268 1.868705
eval load nested_dict_json
0.484512 0.485497 0.487538 0.487866 0.488751

cjson load ustring_json
0.045566 0.045803 0.045846 0.046027 0.046056
demjson load ustring_json
3.391110 3.401287 3.403575 3.408148 3.416667
simplejson load ustring_json
0.243784 0.244193 0.244920 0.245126 0.246061
eval load ustring_json
0.121635 0.121801 0.122561 0.123064 0.123563


## simple json testing

import simplejson
import cjson
import demjson

class A(object):
    def __init__(self):
        self.var1 = 1
        self.var2 = dict(a=1,b=2,c=3)

## TEST DATA
set_ = set([1,2,3,4])
nested_dict = dict(v1="a", v2="b", v3=dict(n1=1,n2=2,n3=3))
ustring = u"a string with some unicod Andre\202"
#In case anyone is wondering, unicod is a text-encoding used by Nova Scotian fishermen.
class_=  A()

## Dump and load methods
dumps = {
    "simplejson":  simplejson.dumps,
    "cjson":  lambda x:  cjson.encode(x,encoding="utf-8"),
    "demjson":  demjson.encode
}
loads = {
    "eval":  eval,
    "simplejson":  simplejson.loads,
    "cjson":  lambda x:  cjson.decode(x,encoding="utf-8"),
    "demjson":  demjson.decode
}

## Can the functions handle different data types
for thing_name in ("set_", "nested_dict", "ustring", "class_", ):
    thing = eval(thing_name)
    for k,fun in dumps.iteritems():
        try:
            out = fun(thing)
            print "SUCCESS:  %s enocdes %s" % (k,thing_name)
            print out
        except Exception, e:
            print "ERROR: %s failed to enocde %s" % (k,thing_name)
            print "ERROR:", e

## Profiling code
#from profile import run
from timeit import Timer
for thing_name in ("nested_dict", "ustring", ):
    thing = eval(thing_name)
    # time our various jsons
    for k, fun in dumps.iteritems():
        print k, "dump",  thing_name
        T = Timer("""fun(thing)""", "from __main__ import fun,thing")
        print " ".join(["%2.6f" % x for x in sorted(T.repeat(5, 10000))])
        #run("for ii in xrange(10000):  fun(thing)")

nested_dict_json = simplejson.dumps(nested_dict)
ustring_json = simplejson.dumps(ustring)

for thing_name in ("nested_dict_json", "ustring_json"):
    thing = eval(thing_name)
    # time our various jsons
    for k, fun in loads.iteritems():
        print k, "load", thing_name
        T = Timer("""fun(thing)""", "from __main__ import fun,thing")
        print " ".join(["%2.6f" % x for x in sorted(T.repeat(5, 10000))])
        #run("for ii in xrange(10000):  fun(thing)")

FOOTNOTES:

*  Whew!  I really dodged a bullet this time!  I thought I was going to have a lot of egg on my face about this.  Some days, one gets lucky right?

About these ads

3 Comments on “Quick and (less?)-Dirty JSON Speed Testing in Python”

  1. Fredrik says:

    “I really dodged a bullet this time!”

    Well, I guess I should have mentioned that the C module was fast enough to win anyway. But if you look at the totals, the others are not quite as slow as the profiler results might have led you to believe.

    As for why the eval function is slower, that’s because it uses Python’s standard compiler, which means that it builds a parse tree, generates byte code, and then executes the byte code to get the result. A dedicated interpreter (which is what cjson really is) can simply build an object tree instead of a parse tree, and skip the other steps.

  2. writeonly says:

    Thanks for clarifying the mechanism for why eval is slower! I don’t know the core cpython implementation well enough to answer questions like that. Also, it was good to finally learn timeit.

  3. John Millikin says:

    Any chance you could run your benchmark against jsonlib[1] as well? In my experience it’s almost as fast as cjson, with the added benefit of actually working[2].

    [1] http://pypi.python.org/pypi/jsonlib/
    [2] cjson.decode (‘[“\/”]’); cjson.encode ([u’\U0001D11E’])


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.