Monday, May 10, 2010

Python Workshop

It's that time of the year again. Time to learn Python! I will be teaching a Python tutorial May 24-28. The tutorial will cover Python for users with little or no programming experience. Examples are geared toward biologists, but most concepts are applicable to all. Critiques and comments of the workshop tutorial material are greatly appreciated.

Register for the workshop.

Thursday, April 22, 2010

Django + Dojo

When I work on HTML projects, I usually use the Dojo Toolkit for my Javascript needs. Lately I've been spending some time playing around with the Python web framework Django. I did some internet searching and found Dojango, a project that integrates Django with Dojo. Dojango has features for automatically turning Django form fields into Dijits (Dojo UI widgets), but unfortunately Dojango uses Dojo's custom HTML attributes with Dojo's parseOnLoad option. I prefer to create Dijits programatically so that my markup stays clean. I decided to develop a Django app to meet my needs.

Code is available from SVN. Instructions are below.

Instructions:

Settings

# Setup app in settings.py

# Required attributes:

# The URL to get dojo.js from
DOJO_URL = MEDIA_URL + 'js/dojo'

# Optional attributes:

# Set to True to add Dojo setup to template
DOJO_ENABLED = True

# Set Dojo theme
DOJO_THEME = 'tundra'

# Set the value of djconfig
DOJO_DJCONFIG = {'isDebug': False, 'parseOnLoad': False,
'modulePaths': {'app': MEDIA_URL + 'js/app'}}

# More on this later
DOJO_FORM_FUNCTION = None

# Attach middleware
MIDDLEWARE_CLASSES = (
'dojo.middleware.DojoMiddleware',
...
)



The dojo object

The middleware attaches a Dojo object to each request. You can access the object from your views, and use it to set Dojo parameters.

# The dojo object has several useful attributes and methods.

# The path to dojo.js (from settings.DOJO_URL), read-only
request.dojo.src

# Theme
request.dojo.theme = 'soria'

# DjConfig
request.dojo.dj_config['isDebug'] = True

# Convenience method to set module paths in dj_config
request.dojo.set_module_path('custom', 'url_to_custom_module')

# Add stylesheets
request.dojo.append_stylesheet('url_to_custom_stylesheet')

# Require modules
request.dojo.append_module('module.to.require')

# Set function to addOnLoad
request.dojo.append_aol('function() {do_something();}')

# Set function to addOnLoad before other already set
request.dojo.prepend_aol('function() {do_something();}'



Forms

Django forms can easily be 'dijitized'.

from dojo import models as dojo

class Register(forms.Form):
username = forms.RegexField(
label='Choose a username (letters and numbers only)',
min_length=2,
max_length=16,
regex=r'^[\w]{2,16}$',
error_messages={
'invalid': 'Username must be 16 characters or shorter, and can only'
' contain letters, numbers, underscores and dashes.'
}
)

# Use the dojo_field function to attach dijit
# parameters to a Django form field.
dojo.dojo_field(username, 'dijit.form.ValidationTextBox')



"""
dojo_field arguments

required
==========
* field - Django field to attach dijit parameters to.
* dojo_type - str, the qualified name of the dijit class to use

keyword
=========
* attr_map - dict, Used to map Django field parameters to Dijit parameters.
Overrides the default values in dojo.models.default_attr_map.
The dict elements should be structured as follows:

Key == Django attribute name
Value == tuple with elements:
[0] == Dijit attribute name to map to
[1] == None, or callable to convert Django value to Dijit value

EXAMPLE:
{
'max_length': ('maxLength', None),
'regex': ('regExp', lambda a: '%s' % a.pattern)
}
* dojo_attrs - dict, Attributes will be applied directly to dijit.
Key == dojo attribute name
Value == dojo attribute value
"""
}


After creating a form, it must be instrumented to create the Javascript required to create the dijits.



def my_view(request):

my_form = Register()

# This call generates all the necessary Javascript code
request.dojo.dojo_form(my_form)

# By default, the function code generated
# is a string to be added in-line.
#
# If you prefer to call a pre-defined JS function,
# just set the request.dojo.form_function attribute.
#
# The value of the attribute should be a tuple where:
# [0] == qualified Dojo module name where function exists
# [1] == function name
#
# request.dojo.form_function can also be set automatically
# by setting DOJO_FORM_FUNCTION in settings.py


Template

Include the following tags within the 'head' tag of your HTML template:


{% load dojo %}
{% dojo request.dojo %}


The Dojo app also includes a script for creating a Dojo build:


python manage.py dojo_build

Saturday, March 6, 2010

UI Design

In the world of scientific software, UIs tend to be horrifically awful. When we started working on our next generation LIMS (called SLM: Sample Lifecycle Manager), one of our goals was to provide an intuitive UI. This post covers some of what we went through on the UI design front.

First Try

After evaluating several Javascript libraries, we decided that Flex would be a good choice, because it would be easy to create a desktop-like application to manage the large feature set. The main screen for our 1st alpha release looked something like this:



The design is similar to a complex desktop app such as Thunderbird or Eclipse. The app has a menu bar, a left panel with data/navigation, a bottom panel with several features, and a main tabbed panel with content.

Our users hated it. We quickly learned that our users don't want desktop-style apps with many features spread across multiple panels. Instead, they are looking for web-style apps where each function of the application can be performed on a single, simple screen, and links are used to navigate between feature screens.

Second Try

Our revised UI eliminated the menu bar. The left and bottom panels are still available for power users, but they are now hidden by default. These simple changes dramatically improved the user's acceptance of the app.



Forms

Flex's form system is great for building complex forms (as long as you can live with the standard theme). Flex has built-in validator classes that can provide real-time visual feed back to users as they enter information into the form. It is very important to customize your validators to provide useful messages to your users, so they know how to correct their input.



Always make sure your forms are fully navigable by keyboard, and make sure that by default, the form will be submitted when the user hits 'Enter'.

SLM has many features, and some of the forms rely on optionally selected fields. For example, if a user selects an option from a drop-down list, then other fields in the form become enabled or disabled. Many desktop-style apps 'grey-out' the disabled fields (this can be done using Flex UIComponent's 'enabled' property), but we found that un-usable fields that remain visible can be confusing for users. Instead, adding or removing the optional fields feels much more intuitive.







Visual Cues

Animations are not just eye-candy, they can also help users navigate through the system. For example, SLM has several drag-and-drop features. At first, some users were unsure about which items could be dragged into a drop-zone. To fix this problem we added a simple glow animation to highlight draggable items whenever the user's mouse is hovering over the drop-zone.

Trust

Customers use SLM to submit samples for testing at our facility. The submission process is similar to a shopping cart check-out on a retail site. It is extremely important to allow a user the ability to move forward and backward through the process. User's don't feel confident unless they trust the system to move back and forth between steps without munging any data they've already entered. If a user needs to change an option in step 2, it shouldn't affect the data they've already entered in step 3. Users must also be confident that nothing will be stored on the server until the final submit step.



For single step operations, the UI must always provide a 'Cancel' option or some other way for a user to back-out of the changes they've made.

Tuesday, November 24, 2009

Profiling and Optimizing Python Code

All programmers have heard the advice: "Don't prematurely optimize code." What exactly does that mean? It means that you shouldn't guess about what is causing your code to run slowly. Chances are you'll guess wrong. Instead of guessing, use profiling and benchmarking tools to quickly and accurately identify the performance bottlenecks in your scripts.

Every now and then I run into a piece of Python code that just doesn't run as fast as I would like. I use 3 different Python tools to find and fix Python performance problems.

cProfile:

cProfile is a module that is included in the Python Standard Library. It logs function calls and execution times. There is also a pure-python version named 'profile'.

KCachegrind:

KCachegrind was written to visualize the output generated by Callgrind (a C profiler), but the function call logs from cProfile can be converted to the KCachegrind format with this script. KCachegrind uses the KDE framework. On Linux boxes, KCachegrind is usually bundled in a package with other development tools written for KDE. The package is named 'kdesdk' or something similar.

timeit:

timeit is a module included in the Python Standard Library that is used for measuring the execution time of arbitrary pieces of code.

Here is the code that needs to be improved:


class LameStringBuilder(object):
def __init__(self, cols=40, rows=10000):
self.cols = cols
self.rows = rows

def build(self, val):
built = ''
for i in range(self.rows):
built += self.build_row(val) + "\n"
return built

def build_row(self, val):
built = ''
for i in range(self.cols - 1):
built += val + ','
built += val
return built


Here is the code to execute a profile test and format the results into something that KCachegrind can understand:


import cProfile

# Use this module to convert profile data
# into the KCacheGrind format.
#
# The module is available here:
# http://www.gnome.org/~johan/lsprofcalltree.py
import lsprofcalltree

import lame_string_builder

def test():
"""Code to profile."""
l = lame_string_builder.LameStringBuilder(40, 10000)
l.build('foo')

if __name__ == "__main__":
# Profile function
p = cProfile.Profile()
p.run('test()');

# Get profile log in KCacheGrind format
# and dump to file.
k = lsprofcalltree.KCacheGrind(p)
out = open('lame_profile.kgrind', 'w')
k.output(out)
out.close()


After running the test, launch KCachegrid and open the profile file to view the results.



The 'Flat Profile' and 'Callee Map' are the 2 most useful displays for finding the bottle necks in your code.

The 'Flat Profile' describes each function called in the test with the following columns:

  • Incl. - The total percentage of time spent within a function.

  • Self - The percentage of time spent within a function NOT including inner function calls.

  • Called - The total number of times the function was called during the test.

  • Function - The name of the function being described.



The 'Callee Map' represents the different functions called during the test. Each function is drawn as a rectangle. Inner function calls are drawn on top of their parent function's rectangle. The area used to draw each function is proportional to the execution time for each function call relative to the total execution time of the parent (also printed as a percentage in the graph).

In the example above it is easy to pick out where the bottle necks are by looking at the 'Callee Map'. Most of the test time was spent within the 'LameStringBuilder.build_row' method, so it is the main bottle neck. Other significant sources of time include calls to 'LameStringBuilder.build' and the built-in function 'range'.

After identifying the bottlenecks, I created a better version of the 'LameStringBuilder' class:


import lame_string_builder

class LessLameStringBuilder(lame_string_builder.LameStringBuilder):
def build(self, val):
return "\n".join([self.build_row(val) for i in xrange(self.rows)])

def build_row(self, val):
return ",".join([val for i in xrange(self.cols)])


After making changes to the code, Python's built-in 'timeit' module can be used to simply and accurately measure the differences in execution speed between the old and the new versions:


import timeit

import lame_string_builder
import less_lame_string_builder

def test_lame(val, cols, rows):
l = lame_string_builder.LameStringBuilder(cols, rows)
l.build(val)

def test_less_lame(val, cols, rows):
l = less_lame_string_builder.LessLameStringBuilder(cols, rows)
l.build(val)

def test_generic(repeat, name, args):
print "%s (x%i):" % (name, repeat)
t = timeit.Timer("%s(%s)" % (name, args), "from __main__ import %s" % name)
print t.timeit(repeat)

def test_all(repeat, val, cols, rows):
args = "'%s', %i, %i" % (val, cols, rows)

for name in ("test_lame", "test_less_lame"):
test_generic(repeat, name, args)

if __name__ == "__main__":
test_all(100, 'foo', 40, 10000)




The new code is significantly faster!

Monday, November 2, 2009

Role based security with Python

Most moderately complex enterprise applications require some sort of role based security and access control. For example, an employee should have access to fill-out her time card, but a manager should also be able to approve the time card. This post outlines how to create an 'access control list' (acl) for Python.

The acl is an object that can be queried to determine if a particular role has permission to access a resource. Each permission is also associated with a 'privilege'. For example, a manager may have both the 'read' and 'write' privileges for the weekly schedule, but an employee only has the 'read' privilege.


import acl

# Each user name should be associated with a role.
role = get_role(username)

# Query the acl to see if a role has access to a resource.
# params: role name, resource, privilege
acl = acl.Acl()
if not acl.check_access(role, 'time_card', 'write'):
# User doesn't have access to this resource!!
pass


The Acl class supports role inheritance. If the 'manager' role inherits the 'employee' role, then managers will have all of the same permissions as employees.


# Build acl list
acl = acl.Acl()

employee = acl.Role('employee')
resource = acl.Resource('time_card')
resource.set_privilege('write')
employee.set_resource(resource)
acl.set_role(employee)

manager = acl.Role('manager')
manager.set_parent(employee)
resource = acl.Resource('time_card')
resource.set_privilege('approve')
manager.set_resource(resource)
acl.set_role(manager)

if acl.check_access('manager', 'time_card', 'write'):
# YES!
pass

if acl.check_acces('manager', 'time_card', 'approve'):
# YES!
pass

if acl.check_access('employee', 'time_card', 'write'):
# YES !
pass

if acl.check_access('employess', 'time_card', 'approve');
# NO!
pass



I'm normally not a very big fan of XML, but in this particular case it is a good format to use if you prefer to store your acl in a configuration file. The Acl object's 'build_acl' method populates the object from a XML file with the following format:


<?xml version="1.0"?>
<!--
- This file configures the roles (user groups)
- and permissions for accessing the system.
-->
<config>
<!--
- Setup roles here.
- Use the 'inheritFrom' tag to
- inherit permissions from another role.
-->
<roleSet>
<role>
<name>customer</name>
</role>
<role>
<name>employee</name>
<inheritFrom>customer</inheritFrom>
</role>
<role>
<name>manager</name>
<inheritFrom>employee</inheritFrom>
</role>
</roleSet>

<!--
- Set permissions for accessing application components here.
- resource -> property being access controlled.
- role -> group or user that can access resource.
- privilege -> privilege that role can use with resource.
-
- Each permission tag can contain multiple
- resources, roles, and privileges.
-->
<permissions>
<permission>
<resources>
<resource>contact_details</resource>
<resource>profile</resource>
</resources>
<roles>
<role>customer</role>
</roles>
<privileges>
<privilege>read</privilege>
<privilege>write</privilege>
</privileges>
</permission>

<permission>
<resources>
<resource>time_card</resource>
</resources>

<roles>
<role>employee</role>
</roles>
<privileges>
<privilege>read</privilege>
<privilege>write</privilege>
</privileges>
</permission>

<permission>
<resources>
<resource>time_card</resource>
</resources>
<roles>
<role>manager</role>
</roles>
<privileges>
<privilege>approve</privilege>
</privileges>
</permission>

</permissions>
</config>



Here is the code for the acl.py module:


"""Role based security"""
from xml.dom.minidom import parse

class AccessError(Exception):
pass

class Resource(object):
"""An Resource is an object that can be accessed by a Role."""
def __init__(self, name=''):
self.name = name
self._privileges = {}

def set_privilege(self, privilege, allowed=True):
self._privileges[privilege] = allowed

def has_access(self, privilege):
if privilege in self._privileges:
return self._privileges[privilege]
return False

def __str__(self):
rpr = self.name + ': '
for privilege, access in self._privileges.iteritems():
rpr += "%s:%s " % (privilege, access)
return rpr

class Role(object):
def __init__(self, name=''):
"""An Acl role has access to resources with specific privileges."""
self.name = name
self._parents = {}
self._resources = {}

def set_parent(self, parent):
self._parents[parent.name] = parent

def set_resource(self, resource):
self._resources[resource.name] = resource

def has_access(self, attr_name, privilege):
if attr_name in self._resources:
if self._resources[attr_name].has_access(privilege):
return True

for parent in self._parents.values():
if parent.has_access(attr_name, privilege):
return True

return False

def __str__(self):
rpr = self.name + ":\n"
rpr += "parents:\n"
for parent in self._parents.keys():
rpr += "\t%s\n" % parent
rpr += "resources:\n"
for resource in self._resources.values():
rpr += "\t%s\n" % resource.describe()
return rpr

class Acl(object):
"""Manages roles and resources.

Singleton class.
"""

class __impl:
"""Implementation of the singleton interface"""

def __init__(self):
self._acl = {}

def set_role(self, role):
self._acl[role.name] = role

def check_access(self, role_name, resource, privilege):
"""Check whether a role has access to a resource or not."""

if not role_name in self._acl:
raise AccessError('Role does not exist.')
return self._acl[role_name].has_access(resource, privilege)

def build_acl(self, file):
"""Build acl from an XML file."""

self._acl = {}
roles_to_create = {}
dom = parse(file)

# Find roles to create
roles_nodes = dom.getElementsByTagName('roleSet')
for roles_node in roles_nodes:
role_nodes = roles_node.getElementsByTagName('role')
for role_node in role_nodes:
name_nodes = role_node.getElementsByTagName('name')
parent_nodes = role_node.getElementsByTagName('inheritFrom')
role_name = name_nodes[0].childNodes[0].data
roles_to_create[role_name] = []

# Find role parents
for parent_node in parent_nodes:
roles_to_create[role_name].append(parent_node.childNodes[0].data)

# build inheritence chain
for role, parents in roles_to_create.iteritems():
self.set_role(self._create_role(role, roles_to_create))

# assign permissions
permissions = dom.getElementsByTagName('permissions')
for permissions_node in permissions:
permission_nodes = permissions_node.getElementsByTagName('permission')
for permission_node in permission_nodes:
resource_nodes = permission_node.getElementsByTagName('resource')
role_nodes = permission_node.getElementsByTagName('role')
privilege_nodes = permission_node.getElementsByTagName('privilege')

for resource_node in resource_nodes:
resource = Resource()
resource.name = resource_node.childNodes[0].data
for privilege_node in privilege_nodes:
resource.set_privilege(privilege_node.childNodes[0].data)

for role_node in role_nodes:
try:
role = self._acl[role_node.childNodes[0].data]
except:
raise AccessError('Role in permission is not defined.')

role.set_resource(resource)

def _create_role(self, role_name, roles_to_create):
"""Recursively create parent roles and then create child role."""

if role_name in self._acl:
role = self._acl[role_name]
else:
role = Role()
role.name = role_name

for parent_name in roles_to_create[role_name]:
if parent_name in self._acl:
parent = self._acl[parent_name]
else:
parent = self._create_role(parent_name, roles_to_create)
self.set_role(parent)
role.set_parent(parent)
return role

def __str__(self):
rpr = ''
for role in self._acl.values():
rpr += '----------\n'
rpr += role.describe()
return rpr

__instance = None

def __init__(self):
""" Create singleton instance """

# Check whether an instance already exists.
# If not, create it.
if Acl.__instance is None:
Acl.__instance = Acl.__impl()

self.__dict__['_Acl__instance'] = Acl.__instance

def __getattr__(self, attr):
""" Delegate get access to implementation """

return getattr(self.__instance, attr)

def __setattr__(self, attr, val):
""" Delegate set access to implementation """

return setattr(self.__instance, attr, val)

Monday, October 19, 2009

SQL Workshop

I've recently finished a SQL tutorial for a relational database workshop I'll be teaching. The tutorial covers basic SQL queries with SQLite and MySql, as well as basic database creation with SQLite. The tutorial is very basic, and the examples are geared toward biologists who need to use a database for performing research.

Sunday, September 20, 2009

Facebook's Tornado

Today I got my first chance to play with Facebook's Python based web server and framework Tornado. The framework is single threaded and asynchronous, similar to how the Twisted framework operates, and Facebook uses the technology to provide data to the 'Friend Feed'.

Python thread creation and maintenance have relatively high overhead, so asynchronous solutions can provide better performance and scalability than more traditional threaded server implementations. This is especially true for servers with high numbers of concurrent connections, and for long-polling and streaming connections that spend most of their time waiting around for a message to be published to a client. Tornado's benchmarks look impressive against threaded Python web servers, but I wonder why they didn't include Twisted, their closest competitor, in the benchmarking.

My first impression is that Tornado is much easier to use than Twisted, but also much less powerful. Twisted is a monolithic package that can be used with many different networking protocols, but Tornado is focused only on http. Tornado provides built in support for templating, authentication, and signed cookies. One really cool feature Tornado provides is semi-automatic XSRF protection, which should save time for developers who would otherwise need to manually implement countermeasures.

The documentation for Tornado is very slim. For example, I couldn't find any information about how to interact directly with Tornado's main event loop (IOLoop). Fortunately, Tornado's code base is small and readable, so reading the source will quickly get you up to speed.

While implementing a Tornado channel for the AmFast project, I ran into a problem that I have also encountered with Twisted. Both Tornado and Twisted use callbacks to complete a request. As an example, if a long-poll client is waiting for a message, you call RequestHandler.finish() to send a response back to the client when a message is published.

The above method works well for a single server instance, but what about when you have multiple servers behind a proxy? A server process may publish a message for a client that is connected to an entirely different server process. The publishing process has no way to notify the subscribing process that a client has received a message.

This problem can be solved by polling a database table or a file that is accessible to all server processes, but that takes system resources, especially if you have many clients and your polling interval is short. One of my future goals is to figure out a more elegant way for the publishing server process to notify the subscribing server process when a client receives a message.