Quick and Dirty JSON Speed Testing in Python

[See updated post for analysis using timeit]

As per Poromenos‘ request on Reddit, I decided to do a bit of expansion on my cryptic comment about the major json packages in python (simplejson, cjson, demjson):

My conclusion: use demjson if you really really want to make sure everything is right, and you don’t care at all about time. Use simplejson if you’re in the 99% of all users who want reasonable performance over a broad range of objects, and use enhanced cjson 1.0.3x if you in the came with reasonable json inputs, and you need much faster (10x) speed…. that is, if the json step is the bottleneck.

More worrisome — demjson didn’t handle the unicode string I threw at it properly…

The Test Setup

Python 2.4.3 on 64-bit Linux

simplejson:  1.9.2, with c-extensions turned on.*
demjson:  1.3
cjson:  enhanced cjson 1.0.3×6 from http://python.cx.hu/python-cjson/

(We need the enhanced version for the simplified “encode using utf-8″ interface.)
* To test for simplejson c-extensions:
> assert getattr(simplejson, ‘_speedups’, None), “no speedups enabled”

Test Code

## simple json testing

import simplejson
import cjson
import demjson

class A(object):
    def __init__(self):
        self.var1 = 1
        self.var2 = dict(a=1,b=2,c=3)

## TEST DATA
set_ = set([1,2,3,4])
nested_dict = dict(v1="a", v2="b", v3=dict(n1=1,n2=2,n3=3))
ustring = u"a string with some unicod Andre\202"
#In case anyone is wondering, unicod is a text-encoding used by Nova Scotian fishermen.
class_=  A()

## Dump and load methods
dumps = {
    "simplejson":  simplejson.dumps,
    "cjson":  lambda x:  cjson.encode(x,encoding="utf-8"),
    "demjson":  demjson.encode
}
loads = {
    "simplejson":  simplejson.loads,
    "cjson":  lambda x:  cjson.decode(x,encoding="utf-8"),
    "demjson":  demjson.decode
}

## Can the functions handle different data types
for thing_name in ("set_", "nested_dict", "ustring", "class_", ):
    thing = eval(thing_name)
    for k,fun in dumps.iteritems():
        try:
            out = fun(thing)
            print "SUCCESS:  %s enocdes %s" % (k,thing_name)
            print out
        except Exception, e:
            print "ERROR: %s failed to enocde %s" % (k,thing_name)
            print "ERROR:", e

## Profiling code
from profile import run
for thing_name in ("nested_dict", "ustring", ):
    thing = eval(thing_name)
    for k,fun in dumps.iteritems():
        print k, thing_name
        run("for ii in xrange(10000):  fun(thing)")

Capability Results

All handled the nested dict fine, demjson was most compact, as advertised.

demjson improperly encoded the unicode string (any ideas why anyone?)

demjson encoded the set as a list, all others failed to encode it.

All failed to encode the class instance, as expected.

ERROR: cjson failed to enocde set_
ERROR: object is not JSON encodable
SUCCESS:  demjson enocdes set_
[1,2,3,4]
ERROR: simplejson failed to enocde set_
ERROR: set([1, 2, 3, 4]) is not JSON serializable
SUCCESS:  cjson enocdes nested_dict
{"v1": "a", "v2": "b", "v3": {"n1": 1, "n2": 2, "n3": 3}}
SUCCESS:  demjson enocdes nested_dict
{"v1":"a","v2":"b","v3":{"n1":1,"n2":2,"n3":3}}
SUCCESS:  simplejson enocdes nested_dict
{"v1": "a", "v2": "b", "v3": {"n1": 1, "n2": 2, "n3": 3}}
SUCCESS:  cjson enocdes ustring
"a string with some unicod Andre\u0082"
SUCCESS:  demjson enocdes ustring
"a string with some unicod Andre"
SUCCESS:  simplejson enocdes ustring
"a string with some unicod Andre\u0082"
ERROR: cjson failed to enocde class_
ERROR: object is not JSON encodable
ERROR: demjson failed to enocde class_
ERROR: ('can not encode object into a JSON representation', <__main__.A object at 0x2aaaae9e2b90>)
ERROR: simplejson failed to enocde class_
ERROR: <__main__.A object at 0x2aaaae9e2b90> is not JSON serializable

Timing Results

cjson is really really fast, especially for the nested dict.  In other (unpublished) experiments, it’s even pretty close to the loading / unloading speeds of python binary load/dump using cPickle.

cjson nested_dict
         10003 function calls in 0.190 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
    10000    0.150    0.000    0.150    0.000 :3()
        1    0.030    0.030    0.180    0.180 :1(?)
        1    0.010    0.010    0.190    0.190 profile:0(for ii in xrange(10000):  fun(thing))
        0    0.000             0.000          profile:0(profiler)

demjson nested_dict

         4290003 function calls (4160003 primitive calls) in 30.770 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   550000    2.060    0.000    2.060    0.000 :0(append)
    10000    0.050    0.000    0.050    0.000 :0(callable)
   960000    2.870    0.000    2.870    0.000 :0(chr)
    60000    0.180    0.000    0.180    0.000 :0(extend)
   960000    2.640    0.000    2.640    0.000 :0(has_key)
   170000    0.740    0.000    0.740    0.000 :0(hasattr)
   820000    2.790    0.000    2.790    0.000 :0(isinstance)
    20000    0.080    0.000    0.080    0.000 :0(iterkeys)
    90000    0.550    0.000    0.550    0.000 :0(join)
    80000    0.230    0.000    0.230    0.000 :0(len)
   140000    0.540    0.000    0.540    0.000 :0(ord)
    10000    0.090    0.000    0.090    0.000 :0(range)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
    20000    0.120    0.000    0.120    0.000 :0(sort)
        1    0.130    0.130   30.770   30.770 :1(?)
    30000    0.350    0.000    0.470    0.000 demjson.py:1220(encode_number)
    80000    3.500    0.000    6.630    0.000 demjson.py:1378(encode_string)
    10000    0.070    0.000   17.860    0.002 demjson.py:1714(encode)
130000/10000    3.210    0.000   17.750    0.002 demjson.py:1737(encode_helper)
20000/10000    2.390    0.000   16.910    0.002 demjson.py:1761(encode_composite)
    10000    0.310    0.000   30.640    0.003 demjson.py:1896(encode)
    20000    0.300    0.000    0.590    0.000 demjson.py:521(extend_and_flatten_list_with_sep)
    80000    0.750    0.000    1.130    0.000 demjson.py:730(isstringtype)
    10000    6.720    0.001   12.420    0.001 demjson.py:863(__init__)
    10000    0.100    0.000    0.100    0.000 demjson.py:910(_set_strictness)
        1    0.000    0.000   30.770   30.770 profile:0(for ii in xrange(10000):  fun(thing))
        0    0.000             0.000          profile:0(profiler)

simplejson nested_dict
         1330003 function calls (950003 primitive calls) in 9.160 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    80000    0.370    0.000    0.370    0.000 :0(encode_basestring_ascii)
    20000    0.100    0.000    0.100    0.000 :0(id)
   270000    0.850    0.000    0.850    0.000 :0(isinstance)
    20000    0.070    0.000    0.070    0.000 :0(iteritems)
    10000    0.020    0.000    0.020    0.000 :0(join)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.030    0.030    9.160    9.160 :1(?)
    10000    0.110    0.000    9.130    0.001 __init__.py:190(dumps)
400000/260000    3.050    0.000    6.220    0.000 encoder.py:212(_iterencode_dict)
500000/260000    3.690    0.000    8.100    0.000 encoder.py:283(_iterencode)
    10000    0.860    0.000    9.020    0.001 encoder.py:345(encode)
    10000    0.010    0.000    0.010    0.000 encoder.py:369(iterencode)
        1    0.000    0.000    9.160    9.160 profile:0(for ii in xrange(10000):  fun(thing))
        0    0.000             0.000          profile:0(profiler)

cjson ustring
         10003 function calls in 0.090 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
    10000    0.030    0.000    0.030    0.000 :3()
        1    0.060    0.060    0.090    0.090 :1(?)
        1    0.000    0.000    0.090    0.090 profile:0(for ii in xrange(10000):  fun(thing))
        0    0.000             0.000          profile:0(profiler)

demjson ustring
        2500003 function calls in 16.090 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    50000    0.130    0.000    0.130    0.000 :0(append)
    10000    0.020    0.000    0.020    0.000 :0(callable)
   960000    2.650    0.000    2.650    0.000 :0(chr)
   970000    2.390    0.000    2.390    0.000 :0(has_key)
    10000    0.010    0.000    0.010    0.000 :0(hasattr)
    70000    0.300    0.000    0.300    0.000 :0(isinstance)
    20000    0.120    0.000    0.120    0.000 :0(join)
    10000    0.040    0.000    0.040    0.000 :0(len)
   330000    0.980    0.000    0.980    0.000 :0(ord)
    10000    0.080    0.000    0.080    0.000 :0(range)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.090    0.090   16.090   16.090 :1(?)
    10000    2.060    0.000    3.490    0.000 demjson.py:1378(encode_string)
    10000    0.070    0.000    3.980    0.000 demjson.py:1714(encode)
    10000    0.250    0.000    3.870    0.000 demjson.py:1737(encode_helper)
    10000    0.220    0.000   16.000    0.002 demjson.py:1896(encode)
    10000    6.610    0.001   11.780    0.001 demjson.py:863(__init__)
    10000    0.070    0.000    0.070    0.000 demjson.py:910(_set_strictness)
        1    0.000    0.000   16.090   16.090 profile:0(for ii in xrange(10000):  fun(thing))
        0    0.000             0.000          profile:0(profiler)

simplejson ustring
         50003 function calls in 0.310 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.050    0.000    0.050    0.000 :0(encode_basestring_ascii)
    20000    0.070    0.000    0.070    0.000 :0(isinstance)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.040    0.040    0.310    0.310 :1(?)
    10000    0.090    0.000    0.270    0.000 __init__.py:190(dumps)
    10000    0.060    0.000    0.180    0.000 encoder.py:345(encode)
        1    0.000    0.000    0.310    0.310 profile:0(for ii in xrange(10000):  fun(thing))
        0    0.000             0.000          profile:0(profiler)

Final Conclusion

I also think simplejson is pretty rad… In the applications I use it for, json conversion is one of the bottlenecks (rather than network timing for example), and I have very well-defined / trusted input / output, so I use cjson-enhanced.

About these ads

2 Comments on “Quick and Dirty JSON Speed Testing in Python”

  1. Kathline Supino says:

    All things are very open and intensely clear explanation of issues. was truly information. Your site is very beneficial. Many thanks sharing.

  2. Just btw – there is an anyjson project: https://bitbucket.org/runeh/anyjson

    It automatically loads fastest JSON library installed in your system


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.