No Geek Bulls**t Programming Class (Results so Far)
October 9, 2009
The Project
Create an accessible ‘learn to program’ class, using Python. Undo damage and barriers to access around geek culture, endemic sexism and racism, and models that say that “only certain people can program”.
Results So Far
So far there have been two class sessions. The gender mix (self-identified) is about 50/50/0 male/female/(genderqueer, intersex) and we have 10 students or so. The self-identified goals of students included: building programs for work, changing careers, remedying previous bad programming class experiences, (rarer) learning python specifically (after knowing some other language).
Lessons Learned (and some Theories)
# Make the class accessible
- No alpha male bulls**t
- No pissfighting over languages, programming backgrounds, etc. remember, even experts start as newbies.
- create safer, accessible spaces (physically accessible, make childcare credits available, advertise to underserved communities. avoid gender / sexuality assumptions, respect pronouns. Enforce safer space.)
# emotions matter in the learning experience
- acknowledge the complexity of programming
- programmers are made, not born
- programming is hard to do, hard to learn
- explain that it was hard for you to learn as well.
- remind learners that making mistakes is how one learns to program
# Start far back. Go back further. Most students know little about how the computer works.
- they haven’t seen / heard of / used the command line / terminal
- they don’t know the difference between the shell and the python environment
- they try things like ” >>> python program.py “
- there will be mac and windows users, prepare for both
- some learners will have programmed before, some will not
# Have a goal / main project for the course
- connect with students.
- build toward a full project
- lessons should iteratively replace / improve / expand on code made during previous lessons
- no math. Math algorithms are boring and irrelevant for most people. Python makes strings easy. Easy strings makes for easy to discuss, real-world data
# Don’t get bogged down in syntax. People don’t care. Python has awesome syntax, mostly.
- Gloss over warts and complexities
- Avoid jargon
# Don’t get bogged down in datatypes. Don’t mention unicode. Ignore tuples.
- Do mention strings, “numbers” (encompassing ints and floats)
- dictionaries before lists. Associating keys and values parallels associating variable names with values. After teaching dicts, lists are trivial.
# relate functions and data structures. They are intertwined and need to be taught in parallel.
- Functions exist to process data structures, and data exist to feed functions
# Ignore Objects and Object-Oriented Programming
- OO isn’t hard, but it is confusing, especially for newbies
- More importantly, it’s *irrelevant* for most early programming tasks
# Now matters more than Complete
- Use Wikibooks or Google Docs for ease in sharing materials. (if repeating, we might choose GDocs — Wikibooks is too much machinery)
- Don’t worry about getting all the details right
# POWERPOINT IS DEATH
When Great Features Aren’t Enough: Twisted, Tornado, the Zero-Step, and Activation Energy
September 12, 2009
Fresh on the heels of Tornado’s release, and Glyph’s response to it (note 1) and others, I’ve been thinking about why Tornado so excites me.
Twisted is a robust, powerful, scalable asynchronous web framework (among other things). We have used it successfully in the past. Taking them at their word, Tornado is scalable, but focused on http and much less fully featured than Twisted, it does provide authentication pieces (awesome!), and some other utilities. In architectural terms, Glyph is probably right that Tornado is incomplete (to be polite).
I still want to use Tornado.
Baby Steps into HBase
July 15, 2009
Today, after reading (the amazing and invaluable!) Understanding HBase and BigTable, while researching schemas for Google App Engine, I took my first tentative steps into using HBase. About HBase:
HBase is the Hadoop database. Its (sic) an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop.
HBase’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Try it if your plans for a data store run to big.
Well, my plans don’t run to big, but they do run to indexed over time. Since every cell in an HBase table has a timestamp, it makes it really easy to snapshot data over time, and “rollback” a query as though it was asked at any point in the past. For data that changes rarely over time, but for which one wants a historical record, this might make querying with history much simpler.
Historical Data Example
Think about how an organization changes over time. Employees enter and leave, business units might be bought and sold. One approach to modeling this is to take a snapshot every day, and store that in a RDBMS. The snapshots will have lot of redundant information, since an org doesn’t really change very much.
A simpler model is to simply enter a new snapshot of the organization when only when it changes, essentially overwriting the previous configuration. Since HBase automatically labels cells with timestamp, this comes for free.
Setting it up
Using Ole-Martin Mørk’s instructions was a breeze! Even though I know almost nothing about Java and the Java environment, I managed it. I followed them, with these modifications:
- After downloading, unzipping, and symbolic linking to ~hbase, I version control the whole thing ( $ git init; git-add * ; git ci -m “initial checkin, as unpacked from source”) , so that if I foul up anything, I can easily revert!
- Edit ~hbase/conf/hbase-env.sh to have the right “JAVA_HOME” which for me (Debian) is -> export JAVA_HOME=/usr/lib/jvm/java-6-openjdk
Since I don’t have passwordless ssh set up to local host, I get this error:
~/hbase$ ~/hbase/bin/start-hbase.sh
localhost: ssh: connect to host localhost port 22: Connection refused
The rest of the example seems to run fine though, and I’m in no mood to really track this down, since I’m still in the experiment phase.
Future Steps
I’m not sure whether I’m be going any deeper anytime soon, since I have a lot of SqlAlchemy code built around handling these sorts of ‘historical’ queries (where inserting and updating are the real difficulties!), but I do like the idea of easily versioned, map-like data stores quite well.
Lemon Candy and Dynamic Programming
December 22, 2008
Over at TheDailyWtf, hidden among some comments was an interesting dynamic programming problem:
Consider this problem:
George bought a sack of 100 pieces of candy at the store. 90 of the pieces are lemon flavored and ten are cherry flavored. Of the two, George prefers the lemon flavored candies.
Every day George randomly picks a piece of candy out of the bag. If it is lemon flavored, he eats it and puts the bag away for the next day.
But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor. In other words, he’ll only put a piece of candy back at most once per day.
What are the odds that when one piece of candy remains, it will be lemon flavored?
I posed the problem at a company where I used to work. All but one person tried to do it recursively. The remaining person tried to do it using an Excel spreadsheet!!!
Maybe I (and some of the other posters on that thread) are morons, but Excel (or in my case, OpenOffice) seemed like a fine way to solve it, so I did.
Read more about the Lemon-Cherry problem, and download the speadsheets used to solve it
Git-svn clone the last few revisions
December 15, 2008
It can be awfully tempting to make some changes to an existing open-source project [1]. Some of that excitement diminishes when one realizes how long a git-svn clone will take on a large project repo, like Python. The gain git-svn gives you in terms of quick history lookup is taken as cost in the beginning.
Instead, we can do a “shallow-copy” to get the last few revisions. It seems that you need to use actual revisions numbers for the first argument to -r, but I could be wrong. I tried using HEAD~1000:HEAD
$ git-svn clone http://svn.python.org/projects/python/trunk/ python-dev -r 65000:HEAD.
If you find this is *still* taking too long, try canceling, changing into the directory and issue a:
$ git svn fetch
Good luck all!
Notes
- Finally got my first one into python, #4568: remove limitation in varargs callback example.
Simple “object-db” using JSON and python-sqlite
December 5, 2008
As part of a much larger project, I have a group of “snapshots” of a complicated data structure. I need to save these in a persistent way, and continue to have access to them, when needed. My solution is to output the snapshots as JSON, and store them into a sqlite database*, where they will be persistent on disk as “jlobs” (json large objects).
This “sqlite as object-db“ has several advantages:
- atomic transactions,
- easy database replication,
- jlob can easily change format without affecting schema
- very light runtime requirements.
Building off of the sqlite3 manual, it is easy to see how to extract the json back *out* of the database.
There are drawbacks to this approach, of course:
- you’re responsible for building and maintaining tables indexing any queryable elements of your jlob, if you want to be able to access them using SQL.
- sql normalization purists will throw up when they look at your schema
(*Note: if you are on centos 5, and do not have access to Python 2.5, make sure that you install python-sqlite2, for example from one of these rpms) rather than updating your python-sqlite in place. BAD THINGS WILL HAPPEN, including breaking yum. )
#!/usr/bin/python
import sys
if sys.version_info >= (2,5):
import sqlite3
else:
from pysqlite2 import dbapi2 as sqlite3
try:
import json
except ImportError:
import simplejson as json
sqlite3.register_converter("json", json.loads)
conn = sqlite3.connect(":memory:", \
detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
c = conn.cursor()
c.row_factory = sqlite3.Row # fields by name
d = conn.cursor() # normal row
json_string = json.dumps( dict(a=1,b=[1,2,3]))
conn.execute('''
create table snapshot(
id INTEGER PRIMARY KEY AUTOINCREMENT,
mydata json);
''')
conn.execute('''
insert into snapshot values
(null, ?)''', (json_string,))
R1 = c.execute("select * from snapshot").fetchone()['mydata']
R2 = d.execute("select * from snapshot").fetchone()[1]
R3 = conn.execute("select * from snapshot").fetchone()[1]
assert R1==R2==R3 == {'a': 1, 'b': [1, 2, 3]}, "all should be equal"
Len() calls can be SLOW in Berkeley Database and Python bsddb.
September 26, 2008
In my day-to-day coding work, I make extensive use of Berkeley DB (bdb) hash and btree tables. They’re really fast, easy-ish to use, and work for the apps I need them for (persistent storage of json and other small data structures).
So, this python code was having all kinds of weird slowdowns for me, and it was the len() call (of all things) that was causing the issue!
As it turns out, sometimes the Berkeley database does have to iterate over all keys to give a proper answer. Even the “fast stats” *number of records* call has to
References:
Jesus Cea’s comments one why bdb’s don’t know how many keys they have
db_stat tool description
DB->stat api
Dumping and loading a bsddb, for humans.
September 26, 2008
Sometimes things happen with Python shelves that screw up the bsddb’s (Berkeley DB [bdb] databases*) that power them. A common way for this to happen is when two apps have it open for writing, and something goes flooey like both try to write to the same page. The bsddb emits this helpful error:
DBRunRecoveryError: [Terror, death and destruction will ensue] or something equally opaque and non-reassuring
So how to run the recovery, eh? Assuming you have the db_dump and db_load tools on your platform, take hints from Library and Extension FAQ and try this bash snippet:
#!/usr/bin/bash
## example usage:
## $ bdb_repair /path/to/my.db
function bdb_repair {
BDIR=`dirname $1` # /path/to/dir
BADDB=`basename $1` # bad.db
cd $BDIR && \
cp $BADDB{,.bak} # seriously! back it up first
db_dump -f $BADDB.dump $BADDB # might take a while
db_load -f $BADDB.dump $BADDB.repaired
cp -o $BADDB.repaired $BADDB
cd -
}
So far, I’ve had universal success with this method.
If any bash gurus want to improve the error handling here, I’d appreciate it.
FOOTNOTES
* Yes, I know this is redundant.
The 100 Doors Puzzle in R at Rosetta Code
September 16, 2008
From time to time I see a puzzle at Rosetta Code that interests me, and I post an R solution for it. This time it was the 100 doors puzzle.
Problem: You have 100 doors in a row that are all initially closed. You make 100 passes by the doors. The first time through, you visit every door and toggle the door (if the door is closed, you open it; if it is open, you close it). The second time you only visit every 2nd door (door #2, #4, #6, …). The third time, every 3rd door (door #3, #6, #9, …), etc, until you only visit the 100th door.
Question: What state are the doors in after the last pass? Which are open, which are closed?
The code for this in R is pretty simple:
# UNOPTIMIZED
doors_puzzle <- function(ndoors=100,passes=100) {
doors <- rep(FALSE,ndoors)
for (ii in seq(1,passes)) {
mask <- seq(0,ndoors,ii)
doors[mask] <- !doors[mask]
}
return (which(doors == TRUE))
}
doors_puzzle()
## optimized version... we only have to to up to the square root of 100
seq(1,sqrt(100))**2
Monty Hall in R
September 13, 2008
Inspired by paddy3118, I decided to write up a Monty Hall simulation in R for Rosetta Code. Enjoy!
… The rules of the game show are as follows: After you have chosen a door, the door remains closed for the time being. The game show host, Monty Hall, who knows what is behind the doors, now has to open one of the two remaining doors, and the door he opens must have a goat behind it…