Recipe: All Linode StackScripts

Lately I’ve been playing around with Linode. One of the neat (but incomplete, hard to use, and not as good as it could be) features are Linode Stack Scripts. I wanted a full set of them to work with in constructing my own. Browsing through them graphically is a hassle, making it hard to grep them, copy from them (since they have line numbers, and the way to remove the line numbers is javascript), etc. This Python script downloads all of them to the cwd.

Season to taste, public domain license!

import re
import urllib
import htmlentitydefs

exp = re.compile(
   '''\[CDATA\[\n\n'''  + \
   '(.*?)' + \
   '\n\]\]>\\</div>',
   re.DOTALL | re.VERBOSE)


myrx = re.compile('&(' + '|'.join(htmlentitydefs.entitydefs.keys()) + ');')
def dehtml(s):
    return re.sub(
    myrx,
    lambda m: htmlentitydefs.entitydefs[m.group(1)],
    s
    )

last_seen = 0
ii = 0
while 1:
    ii += 1
    if ii - last_seen > 100:  
        print ("stopping at %i" % ii)
        break # we've probably seen them all
    try:
        text = urllib.urlopen("http://www.linode.com/stackscripts/view/?StackScriptID=%i#viewSource" % ii).read()
    except:
        continue
    
    try:
        ans = exp.search(text).groups()[0][1:]
    except:
        continue 
    
    ans = dehtml(ans)
    last_seen = ii
    print ("creating stackscript_%i.sh" % ii)
    ofh = file("stackscript_%i.sh" % ii,'w')
    ofh.write(ans)
    ofh.close()


Working Around WordPress

It’s official… I *hate* WordPress syntax and its blog entry box. As the years have gone on, I’ve gotten more annoyed with Wiki syntaxes in general. They all feel both overcomplicated and underpowered at the same time. In particular, both the Mediawiki and WordPress handling of *code* is horrible. I do like both POD and Restructured Text, and tend to write all my docs as .rst’s these days.

But Restructured Text has its own problems. Until there is a simple rst->html script (which I should get around to finding or writing), it’s a bit of a hassle to use.

So my current solution… use StackOverflow… that’s right, SO has an easy-to-amend, easy to use ReST-like syntax…. just use the “Ask A Question” box, hightlight the produced text (and in FF at least!) “View selection source” and paste it back in to the HTML entry area. Not trivial, but easy enough.


An IanB-tastic day, or How A Bug Becomes a Fix

People don’t write enough about how they catch, report, and fix bugs. I hope others will follow my lead in exposing the process more.

  1. Tried to read IanB’s revised webob tutorial at
    http://pythonpaste.org/webob/do-it-yourself.html
  2. where I got annoyed by how copying and pasting the code is hard
    with the “>>>” and “…” symbols. At the first example!
  3. Since Python-Sphinx is the issue, spent some time in #python-docs
    discussing soluions with Taggnostr, including ones that other
    code highlighters use.
  4. Thought about how IanB probably likes that the code isn’t just
    cut and pastable, since it typing it in yourself is much better for learning.
    Decided that I didn’t care!
  5. Built on Taggnostr’s jquery-based fix on the installed
    doctools.js file.
  6. Branched and checked out Sphinx from BitBucket to work on it
    it more formally…
  7. where I promptly made a mess of things. I don’t know jquery or
    javascript very well, so there was a lot of fussing. The problem
    was that between my version and tip, underscore.js was added, so
    using the new doctools.js file in my generated sphinx html tree
    was causing some silent errors! Boy, JS seems to be hard to troubleshoot,
    and not very good at failing loudly!
  8. After finishing my fixes, checked in the fix to BitBucked and pushed.
  9. Made a pull-request, where I discovered that then you pull on your own
    branch at BB, it sends the request to you. This seems, hm…, unintuitive!
  10. After asking about it at #mercurial who answered, and #bitbucket who didn’t,
    people agreed that this was, um, odd behaviour.
  11. So, bug time at BitBucket, where after a search, I found Bug 681…
    (http://bitbucket.org/jespern/bitbucket/issue/681/master-repositories-dont-need-pull-request)
  12. …which was filed by IanB!

Whython – Python For People Who Hate Whitespace

Whython : Whitespace Haters Python

http://writeonly.files.wordpress.com/2010/04/whython.png?w=590

Example

Clearly Confusing (standard 3.x):

for ii in range(10):
    print(ii)
    print("which is %s" % (['even','odd'][ii % 2]))

Improved:

for ii in range(10) {
    print(ii);
    print("which is %s" % (['even','odd'][ii % 2]));
}

Maximum Enterprise Whythonic:

for ii in range(10) { print(ii); print("which is %s" % (['even','odd'][ii % 2])); }

How about some Scheme with your Python?

defun myfun():  return 1
assert myfun() == 1

Or add some Ruby shine?

def myfun() BEGIN return 1; END
assert myfun() == 1

Why Whython?

  • Less Whitespace, More Enterprise
  • It’s not a real language without braces and semi colons
  • Whitespace delimited is like so restrictive, man!
  • Python sucks for code golf
  • Finally, a Python for everyone who can’t decide between tab and space
  • Possibly (as in the mathematical sense – a small non-zero probability)
    useful for doing command line one liners in python
  • Help determine how bad a PEP/developement idea needs to be before
    someone gets kickbanned from #python-dev.

More seriously

  • reading the Dragon Book [Aho86] gives a person dangerous ideas
  • good excuse to deep dive into the python interpreter source code and the AST, dis modules
  • finally wanted to learn GDB and python -d debug mode
  • humoring trolls is fun
  • for education, the whitespace thing really can cause problems. When
    copying code out of books into IDLE or IPython, there are corner cases when
    it terminates blocks “too early”, confusing new learners.
  • preparation for the “Python Spring Cleaning” project, to see how hard it is
    to get and modify source, write a PEP, raise bug ideas, talk in irc, etc.
  • since this is unlikely to ever be adopted by Python (I hope!), it will
    remain a useful exercise, unlike othe “bugs” which get fixed once and for
    all

Want It? (Download and Install)

Are you sure you can handle this level of awesome? Okay! Download and install:

http://bitbucket.org/gregglind/python-whython3k/src/

## Get the source!
$ hg clone https://gregglind@bitbucket.org/gregglind/python-whython3k/
    # or if you haven't jumped on the `Mercurial <http://mercurial.selenic.com/wiki/Tutorial>`_  bandwagon
    # then:  wget http://bitbucket.org/gregglind/python-whython3k/get/79a2c77fe3e1.zip and unzip it!
$ cd python-whython3k
$ configure  # go make a pot of tea
$ make       # go watch an episode of the `IT Crowd <http://www.netflix.com/WiMovie/The_IT_Crowd_Series_1/70113774>`_
$ ./whython  # beautiful failure begins

Limitations

  • only simple_stmt are really usable in this way. That means that
    blocks (functions, if, else, etc.) can’t be nested inside a braced block.

Thanks to

  • The Authors of PEP 306
  • GVR, Martin v. Loewis (my umlaut is misbehaving!), Georg Brandl, Greg Ewing, Jeremy Hylton and others on the
    Python-Dev mailing list
  • Fred Drake, for responding to my crazy and incoherent email
  • gutworth, merwok, __ap__ and others in #python-dev

References

[Aho86] Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman.
Compilers: Principles, Techniques, and Tools,
http://www.amazon.com/exec/obidos/tg/detail/-/0201100886/104-0162389-6419108

You Know, For Kids!

Swear words are a powerful motivator for novice programmers, and an unfortunate byproduct of advanced ones.

Teaching the computer to swear at you in random, creative ways is a powerful experience. It’s easy in most modern operating systems that include scripting languages (thanks for nothin’, Windows!*). So, you out there with the kids, the friends who want to learn, the curious, pass the magical power of curse words along.

* Yes, it’s even possible on Windows thanks to vbscript! On Mac / Linux / Unix, use your favorite: Python / Ruby / Bash / ….!

(inspired by Juliet at how-do-the-young-start-programming-nowadays)


String Join Aggregate Function in PostgreSQL 8.1

Sometimes I store data in denormalized ways, even in Postgres. This can have performance benefits, and definitely can be faster to develop, easier to understand, and in general, Lower Stress ™. Usually, I tend to use comma-delimited text in those sorts of denormalized fields. In this scenario, it’s useful to have an aggregate function to join such fields. So based on Abulyadi/ and some chatter in freenode#postgresql (vol7ron, xzilla, others), here is an 8.1 idiom for “string join aggregate”:

/* 8.3+? */
CREATE AGGREGATE array_agg(anyelement) (
    SFUNC=array_append,
    STYPE=anyarray,
    INITCOND=’{}’
);
/* 8.1; the format for CREATE AGGREGATE changes in later versions */
CREATE AGGREGATE array_agg (
    SFUNC = array_append,
    BASETYPE = anyelement,
    STYPE = anyarray,
    INITCOND = '{}'
);

once the array_agg is created you can call it as:

SELECT array_to_string(array_agg(some_field), ',') FROM some_table;

Quibble, a Damn Small Query Langauge (DSQL) Using Python

This intermediate-level article will demonstrate how do use the filter idiom, delegation tables, list generators and the operator module to create a compact but expandable query langauge for querying data.

When many people hear the word ‘query’, their minds jump to Structured Query Language (SQL).  Now I love SQL as much as anyone[1].  Using SQL for queries is wonderful when one’s data is already loaded into a SQL database[2].  Sometimes the Real World (TM) conspires against this, since:

  • the data might be heterogeneous
  • the data might be easy to express in Python terms, but tedious to refector into a normalized form.  As a quick example, consider a dict of sets, which would require a join and a foreign key and actual *gasp* schema design.
  • one might not have access to a database (though with SQLite being embedded in Python from 2.5 onward, this is less an issue)
  • one might have irrational biases against schemas and the straightjacketing that they impose on agile development, and programmer whimsy.  I suffer from this bias myself, and attend regular SQL indoctination meetings, but so far it’s not sticking!  NoSQL Forever!
  • SQL is enterprisey, but not Web2.0, man!

That said, SQL has lots of advantages:

  • Exteremely flexible, complex querying
  • Widely deployed
  • (etc, etc.)

Let’s begin by building a list of dictionaries to query against.  These could be any list of object that support a dictionary interface.  Note that these objects are heterogeneous.  Also note they are quite contrived, and rather boring.

# a list of dicts to query against
data = [
    dict(a=None, b=1, c=[1,2,3]),
    dict(a=13, d=dict(a=1,b=2)),
    dict(c=13,e="some string"),
    dict(c=10,e="some other string"),
    dict(a=10,e="some other string"),
    {('author','email'): ('Gregg Lind','gregg.lind at fakearoo.com')},
]

Now that we have some data, we’re going to build a simple query language called Quibble [3] to search against it.  We will be using the filter/pipeline idiom.  The filter idiom is quite simple:  if the an object matches some condition, keep it; else continue on.  On Unix, this is a very simple type of pipeline; when one wants venture capital, call it “map-reduce”.  While Python has a filter function (http://docs.python.org/library/functions.html#filter), the list comprehension builtin will be quite a bit simpler to use for our dumb purposes.

Next we will build a delegation table.  This simple mapping maps names like “<=” to functions.  When people talk about the power of ‘functions are first-class objects’, which is part of what they’re on about.  We can make this mapping of function shorthand names mapped to *unevaluated functions*.

To make our lives easier, Quibble will use a simple convention for defining what is a valid query operator.  An ‘operator function’ must take exactly two argument, following this format:

     my_operator(some_dict[key], value)

Luckily for us [4], the functions in the python operator module http://docs.python.org/library/operator.html mostly take this form.  Having this same calling convention will make it possible to just drop the ‘right’ function in.

import operator
operators = {
    "<" : operator.lt,
    "<=" : operator.le,
    "==" : operator.eq,
    "!=" : operator.ne,
    ">=" : operator.ge,
    ">"  : operator.gt,
    "in" : operator.contains,
    "nin" : lambda x,y: not operator.contains(x,y),
}

Note that with ‘nin’, we had to wrap it.   Python’s lambda statement makes this easy, and the resulting code is still easy-to-read.  We could also use a true named function here, like this:

def nin_(x,y):
    return x not in y

Or a simpler lambda:

"nin" :  lambda x,y:  x not in y,
def query(D,key,val,operator="=="):
    '''
    D:  a dictionary
    key:  the key to query
    val:  the value
    operator:  "==", ">=", "in", et all.

    Returns elements in D such that operator(D.get(key,None), val) is true
    '''
    try:
        op = operators[operator]
    except KeyError:
        raise ValueError, "operator must be one of %r" % operators
    return [x for x in D if op(x.get(key,None),val)]

print "1st version"
print query(data,'a',1)
print query(data,'c',None,'!=')

Excellent.  Time to retire to a private island.  Oh wait, you want to define new functions?  Chain these queries together?  It should handle exceptions?  We can fix those.

A more fundamental problem with this filter approach is that defining “or” conditions is quite awkward, since filters reduce the input set at each stage, but we will clean this up as well (but it will be ugly).

Let’s add some functionality.

  • operator can be any two argument function
  • return an iterator instead of a list
  • tee the original input, just in case it too is an iterator, we don’t want to exhaust it.
  • adds a keynotfound argument, to change what happens if the key isn't found in the dict
import itertools
import inspect
def _can_take_at_least_n_args(f,n=2):
    ''' helper to check that a function can take at least two unnamed args'''
    (pos, args,kwargs, defaults) = inspect.getargspec(f)
    if args is not None or len(pos) >= n:
        return True
    else:
        return False

def query(D,key,val,operator="==", keynotfound=None):
    '''
    D:  a list of dictionaries
    key:  the key to query
    val:  the value
    operator:  "==", ">=", "in", et all, or any two-arg function
    keynotfound:  value if key is not found

    Returns elements in D such that operator(D.get(key,None), val) is true
    '''
    D = itertools.tee(D,2)[1]  # take a teed copy

    # let's let operator be any two argument callable function, *then*
    # fall back on the delegation table.
    if callable(operator):
        if not _can_take_at_least_n_args(operator,2):
            raise ValueError ("operator must take at least 2 arguments")
            # alternately, we could wrap it in a lambda, like:
            # op = lambda(x,y): operator(x),
            # but we have to check to see how many args it really wants (inc. 0!)
        op = operator
    else:
        op = operators.get(operator,None)
    if not op:
        raise ValueError, "operator must be one of %r, or a two-argument function" % operators

    def try_op(f,x,y):
        try:
            ans = f(x,y)
            return f(x,y)
        except Exception, exc:
            return False

    return (x for x in D if try_op(op, x.get(key,keynotfound),val))

print "2nd version"
print list(query(data,'a',1))
print list(query(data,'c',None,'!='))
at_fakaroo = lambda k,v:  "fakearoo" in k[1] # v will be irrelevant
print list(query(data, ('author','email'), None, at_fakaroo, keynotfound=('','')))

That is looking quite a bit more powerful!  It still has lots of problems:

  • ‘or’ isn’t well supported.
  • we handle all errors in the function equivalently — by eating them!  This will make it really hard to debug, since none of us writes perfect code.
  • chaining queries is doable via nesting, but it’s ugly (see below).
  • relies on the dictionary interface
  • awkward to peer inside nested components
  • doesn’t handle attribute lookup easily (but could be modified to, using getattr http://docs.python.org/library/functions.html#getattr)

Let’s try to make a “Queryable” object that chains operations via method calls (something like
SQLAlchememy generative selects http://www.sqlalchemy.org/docs/05/sqlexpression.html#intro-to-generative-selects-and-transformations):

class Queryable(object):
    def __init__(self,D):
        self.D = itertools.tee(D,2)[1]

    def tolist(self):
        return list(itertools.tee(self.D,2)[1])

    def query(self,*args,**kwargs):
        return Queryable(query(self.D,*args,**kwargs))

    q = query

print "3rd version, Queryable"
# c > 10 and "other" in e
Q = Queryable(data).q('c',8,'>')
print Q.tolist()
Q = Q.q('e', 'other', 'in')
print Q.tolist()

This is OKAY, and but it still has plenty of codesmell.

  • lots of tee madness
  • ugly “tolist” method
  • we’re the query optimizer… we’re guaranteed that at least one pass will be O(n), since there is no indexing, and no smarts at all in the querying.

Next steps / alternatives:

Knowing when to give up!

Like any domain specific language, Quibble (as written here) walks a very fine line between functionality and complexity (okay it stumbles over the line drunkenly, but not by too much!) If we need much more complexity in our queries (or object model) then we’re back to writing python, and investigating a proper solution (SQL, Mongo, etc.) is probably worthwhile!  For a simple reporting language, or debugging, or a simple command line interface, this might be plenty.

Happy Yule!

Notes:

1. Not true, I hate it.

2. Unless it’s super complex to query, involves lots of joins, or the query optimizer is off drunk at the pub, or stars are poorly aligned.

3. Quibble — from Query Bibble, Bibble being an ancient Etruscan word for a teething ring.

4. Well, actually, not lucky at all.  Like most scientific papers, this article pretends that inquiry is orderly.  I knew that I wanted to talk about the operator module, and most of the functions in operator take this form, so it seems like a sensible first-approximation convention.


Two Simple Tips to Speed up Python Time Parsing

  1. Sometimes, date parsing formatting in Python takes a long time. It can be worth writing custom datestring converters to sacrifice generality for speed.
  2. Another oddity:  setting the timezone by force can speed up code as well, like this: os.environ['TZ'] = ‘GMT’

Both tips are demo’d and tested in the code snipped below.

import os
import time

def _convert_date(string, year=None):
 ''' take a log string, turn it into time epoch, tuple, string

 >>> _convert_date2('Aug 19 13:45:01',2009)
 (1250689501, (2009, 8, 19, 13, 45, 1, 2, 231, 0), 'Aug 19 13:45:01')
 '''
 if year is None:  year = time.gmtime()[0]

 # was, but this profiled 4x slower
 tt = list(time.strptime("%s " % year + string, "%Y %b %d %H:%M:%S"))
 tt[-1] = 0 # turn off timezone
 tt= tuple(tt)
 ts = int(time.mktime(tt))
 return (ts,tt,string)

_months = dict(jan=1,feb=2,mar=3,apr=4,may=5,jun=6,jul=7,aug=8,sep=9,oct=10,nov=11,dec=12)
def _convert_date2(string, year=None):
 ''' take a log string, turn it into time epoch, tuple, string

 >>> _convert_date2('Aug 19 13:45:01',2009)
 (1250689501, (2009, 8, 19, 13, 45, 1, 2, 231, 0), 'Aug 19 13:45:01')
 '''
 if year is None:  year = time.gmtime()[0]

 # was, but this profiled 4x slower
 #tt = list(time.strptime("%s " % year + x, "%Y %b %d %H:%M:%S"))
 mon,d,t  = string.split()
 h,m,s = t.split(":")
 mon = _months[mon.lower()]
 tt = [year, mon,d,h,m,s,0,0,0]
 tt = tuple([int(v) for v  in tt])
 ts = int(time.mktime(tt))
 tt = time.gmtime(ts)
 return (ts,tt,string)

assert _convert_date('Aug 19 13:45:01',2009) == _convert_date2('Aug 19 13:45:01',2009)

#%timeit is an ipython macro that is like timeit.Timer with brains!

# including figuring out how many loops to run heuristically

# key fact:  a microsecond is 1000 nanoseconds

timeit _convert_date('Aug 19 13:45:01',2009)
timeit _convert_date2('Aug 19 13:45:01',2009)
os.environ['TZ'] = 'GMT'
timeit _convert_date('Aug 19 13:45:01',2009)
timeit _convert_date2('Aug 19 13:45:01',2009)

Results  (Python 2.4.3 on x64 Linux):

timeit _convert_date(‘Aug 19 13:45:01′,2009)
10000 loops, best of 3: 62 µs per loop

In [11]: timeit _convert_date2(‘Aug 19 13:45:01′,2009)
10000 loops, best of 3: 18.3 µs per loop

In [12]: os.environ['TZ'] = ‘GMT’

In [13]: timeit _convert_date(‘Aug 19 13:45:01′,2009)
10000 loops, best of 3: 60.2 µs per loop

In [14]: timeit _convert_date2(‘Aug 19 13:45:01′,2009)
100000 loops, best of 3: 13.3 µs per loop

The Win Factor:

  • custom parser:  300%
  • setting TZ:  20%

Feedback and additional speedup improvements welcome.

(Thanks to Jon Nelson; of the Pycurious Blog for the TZ idea)


No Geek Bulls**t Programming Class (Results so Far)

The Project

Create an accessible ‘learn to program’ class, using Python. Undo damage and barriers to access around geek culture, endemic sexism and racism, and models that say that “only certain people can program”.

Bits and Bites (at TC ExC0)

Choose Your Own Pyventure (Wikibook)

Results So Far

So far there have been two class sessions. The gender mix (self-identified) is about 50/50/0 male/female/(genderqueer, intersex) and we have 10 students or so. The self-identified goals of students included: building programs for work, changing careers, remedying previous bad programming class experiences, (rarer) learning python specifically (after knowing some other language).

Lessons Learned (and some Theories)

# Make the class accessible

  • No alpha male bulls**t
  • No pissfighting over languages, programming backgrounds, etc. remember, even experts start as newbies.
  • create safer, accessible spaces (physically accessible, make childcare credits available, advertise to underserved communities. avoid gender / sexuality assumptions, respect pronouns. Enforce safer space.)

# emotions matter in the learning experience

  • acknowledge the complexity of programming
  • programmers are made, not born
  • programming is hard to do, hard to learn
  • explain that it was hard for you to learn as well.
  • remind learners that making mistakes is how one learns to program

# Start far back. Go back further. Most students know little about how the computer works.

  • they haven’t seen / heard of / used the command line / terminal
  • they don’t know the difference between the shell and the python environment
    • they try things like ” >>> python program.py “
  • there will be mac and windows users, prepare for both
  • some learners will have programmed before, some will not

# Have a goal / main project for the course

  • connect with students.
  • build toward a full project
  • lessons should iteratively replace / improve / expand on code made during previous lessons
  • no math. Math algorithms are boring and irrelevant for most people. Python makes strings easy. Easy strings makes for easy to discuss, real-world data

# Don’t get bogged down in syntax. People don’t care. Python has awesome syntax, mostly.

  • Gloss over warts and complexities
  • Avoid jargon

# Don’t get bogged down in datatypes. Don’t mention unicode. Ignore tuples.

  • Do mention strings, “numbers” (encompassing ints and floats)
  • dictionaries before lists. Associating keys and values parallels associating variable names with values. After teaching dicts, lists are trivial.

# relate functions and data structures. They are intertwined and need to be taught in parallel.

  • Functions exist to process data structures, and data exist to feed functions

# Ignore Objects and Object-Oriented Programming

  • OO isn’t hard, but it is confusing, especially for newbies
  • More importantly, it’s *irrelevant* for most early programming tasks

# Now matters more than Complete

  • Use Wikibooks or Google Docs for ease in sharing materials. (if repeating, we might choose GDocs — Wikibooks is too much machinery)
  • Don’t worry about getting all the details right

# POWERPOINT IS DEATH


Bits and Bites — Programming First Steps (free class)

After reading Kirrily Roberts’ OSCON Keynote, and links from there to
GeekFeminism (a via Lindsey Kuper), I’ve been riled up about barriers to access in the programming community. I come from a non-traditional programming background (more on that in later journals), and had a lot of baggage about the mystique of programming. So, a friend and I decided to do something about it, and have some free classes for non-traditional programmers through the Twin Cities Experimental College.


Bits and Bites — Programming First Steps

Read the rest of this entry »


Follow

Get every new post delivered to your Inbox.