An IanB-tastic day, or How A Bug Becomes a Fix
Posted: April 2, 2010 Filed under: javascript, programming, Python | Tags: development process Leave a commentPeople don’t write enough about how they catch, report, and fix bugs. I hope others will follow my lead in exposing the process more.
- Tried to read IanB’s revised webob tutorial at
http://pythonpaste.org/webob/do-it-yourself.html … - where I got annoyed by how copying and pasting the code is hard
with the “>>>” and “…” symbols. At the first example! - Since Python-Sphinx is the issue, spent some time in #python-docs
discussing soluions with Taggnostr, including ones that other
code highlighters use. - Thought about how IanB probably likes that the code isn’t just
cut and pastable, since it typing it in yourself is much better for learning.
Decided that I didn’t care! - Built on Taggnostr’s jquery-based fix on the installed
doctools.js file. - Branched and checked out Sphinx from BitBucket to work on it
it more formally… - where I promptly made a mess of things. I don’t know jquery or
javascript very well, so there was a lot of fussing. The problem
was that between my version and tip, underscore.js was added, so
using the new doctools.js file in my generated sphinx html tree
was causing some silent errors! Boy, JS seems to be hard to troubleshoot,
and not very good at failing loudly! - After finishing my fixes, checked in the fix to BitBucked and pushed.
- Made a pull-request, where I discovered that then you pull on your own
branch at BB, it sends the request to you. This seems, hm…, unintuitive! - After asking about it at #mercurial who answered, and #bitbucket who didn’t,
people agreed that this was, um, odd behaviour. - So, bug time at BitBucket, where after a search, I found Bug 681…
(http://bitbucket.org/jespern/bitbucket/issue/681/master-repositories-dont-need-pull-request) - …which was filed by IanB!
Whython – Python For People Who Hate Whitespace
Posted: April 1, 2010 Filed under: programming, Python, whython 32 CommentsWhython : Whitespace Haters Python
Example
Clearly Confusing (standard 3.x):
for ii in range(10):
print(ii)
print("which is %s" % (['even','odd'][ii % 2]))
Improved:
for ii in range(10) {
print(ii);
print("which is %s" % (['even','odd'][ii % 2]));
}
Maximum Enterprise Whythonic:
for ii in range(10) { print(ii); print("which is %s" % (['even','odd'][ii % 2])); }
How about some Scheme with your Python?
defun myfun(): return 1
assert myfun() == 1
Or add some Ruby shine?
def myfun() BEGIN return 1; END
assert myfun() == 1
Why Whython?
- Less Whitespace, More Enterprise
- It’s not a real language without braces and semi colons
- Whitespace delimited is like so restrictive, man!
- Python sucks for code golf
- Finally, a Python for everyone who can’t decide between tab and space
- Possibly (as in the mathematical sense – a small non-zero probability)
useful for doing command line one liners in python - Help determine how bad a PEP/developement idea needs to be before
someone gets kickbanned from #python-dev.
More seriously
- reading the Dragon Book [Aho86] gives a person dangerous ideas
- good excuse to deep dive into the python interpreter source code and the AST, dis modules
- finally wanted to learn GDB and python -d debug mode
- humoring trolls is fun
- for education, the whitespace thing really can cause problems. When
copying code out of books into IDLE or IPython, there are corner cases when
it terminates blocks “too early”, confusing new learners. - preparation for the “Python Spring Cleaning” project, to see how hard it is
to get and modify source, write a PEP, raise bug ideas, talk in irc, etc. - since this is unlikely to ever be adopted by Python (I hope!), it will
remain a useful exercise, unlike othe “bugs” which get fixed once and for
all
Want It? (Download and Install)
Are you sure you can handle this level of awesome? Okay! Download and install:
http://bitbucket.org/gregglind/python-whython3k/src/
## Get the source!
$ hg clone https://gregglind@bitbucket.org/gregglind/python-whython3k/
# or if you haven't jumped on the `Mercurial <http://mercurial.selenic.com/wiki/Tutorial>`_ bandwagon
# then: wget http://bitbucket.org/gregglind/python-whython3k/get/79a2c77fe3e1.zip and unzip it!
$ cd python-whython3k
$ configure # go make a pot of tea
$ make # go watch an episode of the `IT Crowd <http://www.netflix.com/WiMovie/The_IT_Crowd_Series_1/70113774>`_
$ ./whython # beautiful failure begins
Limitations
- only simple_stmt are really usable in this way. That means that
blocks (functions, if, else, etc.) can’t be nested inside a braced block.
Thanks to
- The Authors of PEP 306
- GVR, Martin v. Loewis (my umlaut is misbehaving!), Georg Brandl, Greg Ewing, Jeremy Hylton and others on the
Python-Dev mailing list - Fred Drake, for responding to my crazy and incoherent email
- gutworth, merwok, __ap__ and others in #python-dev
References
[Aho86] | Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools, http://www.amazon.com/exec/obidos/tg/detail/-/0201100886/104-0162389-6419108 |
You Know, For Kids!
Posted: March 22, 2010 Filed under: programming, teaching Leave a commentSwear words are a powerful motivator for novice programmers, and an unfortunate byproduct of advanced ones.
Teaching the computer to swear at you in random, creative ways is a powerful experience. It’s easy in most modern operating systems that include scripting languages (thanks for nothin’, Windows!*). So, you out there with the kids, the friends who want to learn, the curious, pass the magical power of curse words along.
* Yes, it’s even possible on Windows thanks to vbscript! On Mac / Linux / Unix, use your favorite: Python / Ruby / Bash / ….!
(inspired by Juliet at how-do-the-young-start-programming-nowadays)
No Geek Bulls**t Programming Class (Results so Far)
Posted: October 9, 2009 Filed under: programming, Python, teaching 3 CommentsThe Project
Create an accessible ‘learn to program’ class, using Python. Undo damage and barriers to access around geek culture, endemic sexism and racism, and models that say that “only certain people can program”.
Results So Far
So far there have been two class sessions. The gender mix (self-identified) is about 50/50/0 male/female/(genderqueer, intersex) and we have 10 students or so. The self-identified goals of students included: building programs for work, changing careers, remedying previous bad programming class experiences, (rarer) learning python specifically (after knowing some other language).
Lessons Learned (and some Theories)
# Make the class accessible
- No alpha male bulls**t
- No pissfighting over languages, programming backgrounds, etc. remember, even experts start as newbies.
- create safer, accessible spaces (physically accessible, make childcare credits available, advertise to underserved communities. avoid gender / sexuality assumptions, respect pronouns. Enforce safer space.)
# emotions matter in the learning experience
- acknowledge the complexity of programming
- programmers are made, not born
- programming is hard to do, hard to learn
- explain that it was hard for you to learn as well.
- remind learners that making mistakes is how one learns to program
# Start far back. Go back further. Most students know little about how the computer works.
- they haven’t seen / heard of / used the command line / terminal
- they don’t know the difference between the shell and the python environment
- they try things like ” >>> python program.py “
- there will be mac and windows users, prepare for both
- some learners will have programmed before, some will not
# Have a goal / main project for the course
- connect with students.
- build toward a full project
- lessons should iteratively replace / improve / expand on code made during previous lessons
- no math. Math algorithms are boring and irrelevant for most people. Python makes strings easy. Easy strings makes for easy to discuss, real-world data
# Don’t get bogged down in syntax. People don’t care. Python has awesome syntax, mostly.
- Gloss over warts and complexities
- Avoid jargon
# Don’t get bogged down in datatypes. Don’t mention unicode. Ignore tuples.
- Do mention strings, “numbers” (encompassing ints and floats)
- dictionaries before lists. Associating keys and values parallels associating variable names with values. After teaching dicts, lists are trivial.
# relate functions and data structures. They are intertwined and need to be taught in parallel.
- Functions exist to process data structures, and data exist to feed functions
# Ignore Objects and Object-Oriented Programming
- OO isn’t hard, but it is confusing, especially for newbies
- More importantly, it’s *irrelevant* for most early programming tasks
# Now matters more than Complete
- Use Wikibooks or Google Docs for ease in sharing materials. (if repeating, we might choose GDocs — Wikibooks is too much machinery)
- Don’t worry about getting all the details right
# POWERPOINT IS DEATH
When Great Features Aren’t Enough: Twisted, Tornado, the Zero-Step, and Activation Energy
Posted: September 12, 2009 Filed under: programming, Python | Tags: Python, tornado, twisted, usability, web.py 8 CommentsFresh on the heels of Tornado’s release, and Glyph’s response to it (note 1) and others, I’ve been thinking about why Tornado so excites me.
Twisted is a robust, powerful, scalable asynchronous web framework (among other things). We have used it successfully in the past. Taking them at their word, Tornado is scalable, but focused on http and much less fully featured than Twisted, it does provide authentication pieces (awesome!), and some other utilities. In architectural terms, Glyph is probably right that Tornado is incomplete (to be polite).
I still want to use Tornado.
Baby Steps into HBase
Posted: July 15, 2009 Filed under: db, programming, unix | Tags: db, hbase Leave a commentToday, after reading (the amazing and invaluable!) Understanding HBase and BigTable, while researching schemas for Google App Engine, I took my first tentative steps into using HBase. About HBase:
HBase is the Hadoop database. Its (sic) an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop.
HBase’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Try it if your plans for a data store run to big.
Well, my plans don’t run to big, but they do run to indexed over time. Since every cell in an HBase table has a timestamp, it makes it really easy to snapshot data over time, and “rollback” a query as though it was asked at any point in the past. For data that changes rarely over time, but for which one wants a historical record, this might make querying with history much simpler.
Historical Data Example
Think about how an organization changes over time. Employees enter and leave, business units might be bought and sold. One approach to modeling this is to take a snapshot every day, and store that in a RDBMS. The snapshots will have lot of redundant information, since an org doesn’t really change very much.
A simpler model is to simply enter a new snapshot of the organization when only when it changes, essentially overwriting the previous configuration. Since HBase automatically labels cells with timestamp, this comes for free.
Setting it up
Using Ole-Martin Mørk’s instructions was a breeze! Even though I know almost nothing about Java and the Java environment, I managed it. I followed them, with these modifications:
- After downloading, unzipping, and symbolic linking to ~hbase, I version control the whole thing ( $ git init; git-add * ; git ci -m “initial checkin, as unpacked from source”) , so that if I foul up anything, I can easily revert!
- Edit ~hbase/conf/hbase-env.sh to have the right “JAVA_HOME” which for me (Debian) is -> export JAVA_HOME=/usr/lib/jvm/java-6-openjdk
Since I don’t have passwordless ssh set up to local host, I get this error:
~/hbase$ ~/hbase/bin/start-hbase.sh
localhost: ssh: connect to host localhost port 22: Connection refused
The rest of the example seems to run fine though, and I’m in no mood to really track this down, since I’m still in the experiment phase.
Future Steps
I’m not sure whether I’m be going any deeper anytime soon, since I have a lot of SqlAlchemy code built around handling these sorts of ‘historical’ queries (where inserting and updating are the real difficulties!), but I do like the idea of easily versioned, map-like data stores quite well.
Lemon Candy and Dynamic Programming
Posted: December 22, 2008 Filed under: programming, rosetta code | Tags: calc, dynamic programming, openoffice Leave a commentOver at TheDailyWtf, hidden among some comments was an interesting dynamic programming problem:
Consider this problem:
George bought a sack of 100 pieces of candy at the store. 90 of the pieces are lemon flavored and ten are cherry flavored. Of the two, George prefers the lemon flavored candies.
Every day George randomly picks a piece of candy out of the bag. If it is lemon flavored, he eats it and puts the bag away for the next day.
But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor. In other words, he’ll only put a piece of candy back at most once per day.
What are the odds that when one piece of candy remains, it will be lemon flavored?
I posed the problem at a company where I used to work. All but one person tried to do it recursively. The remaining person tried to do it using an Excel spreadsheet!!!
Maybe I (and some of the other posters on that thread) are morons, but Excel (or in my case, OpenOffice) seemed like a fine way to solve it, so I did.
Read more about the Lemon-Cherry problem, and download the speadsheets used to solve it
Git-svn clone the last few revisions
Posted: December 15, 2008 Filed under: git, programming 3 CommentsIt can be awfully tempting to make some changes to an existing open-source project [1]. Some of that excitement diminishes when one realizes how long a git-svn clone will take on a large project repo, like Python. The gain git-svn gives you in terms of quick history lookup is taken as cost in the beginning.
Instead, we can do a “shallow-copy” to get the last few revisions. It seems that you need to use actual revisions numbers for the first argument to -r
, but I could be wrong. I tried using HEAD~1000:HEAD
$ git-svn clone http://svn.python.org/projects/python/trunk/ python-dev -r 65000:HEAD
.
If you find this is *still* taking too long, try canceling, changing into the directory and issue a:
$ git svn fetch
Good luck all!
Notes
- Finally got my first one into python, #4568: remove limitation in varargs callback example.
Simple “object-db” using JSON and python-sqlite
Posted: December 5, 2008 Filed under: json, modules, programming, Python | Tags: centos, oodb, python-sqlite2, rpm, sqlite 6 CommentsAs part of a much larger project, I have a group of “snapshots” of a complicated data structure. I need to save these in a persistent way, and continue to have access to them, when needed. My solution is to output the snapshots as JSON, and store them into a sqlite database*, where they will be persistent on disk as “jlobs” (json large objects).
This “sqlite as object-db” has several advantages:
- atomic transactions,
- easy database replication,
- jlob can easily change format without affecting schema
- very light runtime requirements.
Building off of the sqlite3 manual, it is easy to see how to extract the json back *out* of the database.
There are drawbacks to this approach, of course:
- you’re responsible for building and maintaining tables indexing any queryable elements of your jlob, if you want to be able to access them using SQL.
- sql normalization purists will throw up when they look at your schema
(*Note: if you are on centos 5, and do not have access to Python 2.5, make sure that you install python-sqlite2, for example from one of these rpms) rather than updating your python-sqlite in place. BAD THINGS WILL HAPPEN, including breaking yum. )
#!/usr/bin/python import sys if sys.version_info >= (2,5): import sqlite3 else: from pysqlite2 import dbapi2 as sqlite3 try: import json except ImportError: import simplejson as json sqlite3.register_converter("json", json.loads) conn = sqlite3.connect(":memory:", \ detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES) c = conn.cursor() c.row_factory = sqlite3.Row # fields by name d = conn.cursor() # normal row json_string = json.dumps( dict(a=1,b=[1,2,3])) conn.execute(''' create table snapshot( id INTEGER PRIMARY KEY AUTOINCREMENT, mydata json); ''') conn.execute(''' insert into snapshot values (null, ?)''', (json_string,)) R1 = c.execute("select * from snapshot").fetchone()['mydata'] R2 = d.execute("select * from snapshot").fetchone()[1] R3 = conn.execute("select * from snapshot").fetchone()[1] assert R1==R2==R3 == {'a': 1, 'b': [1, 2, 3]}, "all should be equal"
Len() calls can be SLOW in Berkeley Database and Python bsddb.
Posted: September 26, 2008 Filed under: modules, performance, programming, Python | Tags: bdb, bsddb3, oracle Leave a commentIn my day-to-day coding work, I make extensive use of Berkeley DB (bdb) hash and btree tables. They’re really fast, easy-ish to use, and work for the apps I need them for (persistent storage of json and other small data structures).
So, this python code was having all kinds of weird slowdowns for me, and it was the len()
call (of all things) that was causing the issue!
As it turns out, sometimes the Berkeley database does have to iterate over all keys to give a proper answer. Even the “fast stats” *number of records* call has to
References:
Jesus Cea’s comments one why bdb’s don’t know how many keys they have
db_stat tool description
DB->stat api