Saturday, December 10, 2011

AmFast 0.5.3 Released

AmFast version 0.5.3 has been released. This release contains several important bug fixes. The code can be downloaded from PyPi or checked out from SVN.

Saturday, October 8, 2011

Testing With Browser Mob

The Project

I recently got the chance to work on a project using BrowserMob for automated testing.  BrowserMob allows you to run Selenium test scripts "in the cloud". In my case I was not testing functionality, but instead  testing performance of a Flex app. I needed to test latency and throughput of messages being dispatched through the Flex messaging system via http://code.google.com/p/amfast/. This proved very difficult to test locally, but was a snap with BrowserMob.

The Testing

I created a simple Flex client to send and receive Flex messages in a way that replicated a production environment. I also created a custom server component to replicate the production environment and to help log message data to be analyzed later. After getting the client and server running locally, I signed up for a BrowserMob account and launched several browsers with their web interface.

The whole process was simpler than it should have been, and I was very impressed with how well it worked. I highly recommend trying out but BrowserMob for performance and load testing applications, and I'm hoping to get a chance to try out running more full featured automated functional tests in the future.

Tuesday, August 30, 2011

Paired Programming

Rally encourages its engineering staff to post to the company blog, so I decided to write a little about my initial experience with paired programming. TLDR: paired programming is better than I expected, increases code quality, and probably productivity.

Tuesday, July 12, 2011

tirtle: a Spring Web MVC project running on GAE

I decided to put together a simple web app on my way to learning Java and Spring. Tirtle allows users to track daily numbers such as how many calories they eat in a day. The project uses the Spring Web MVC framework and is running on Google App Engine.

The most frustrating part of the project was just getting everything configured and getting a working server up and running. I feel there is a lack of documentation aimed at beginners, although part of my problems may have been related to jumping right in with Spring + GAE. I chose Spring Framework version 3.0 (the latest version), but there seemed to be more documentation and blog tutorials available for version 2.5. I found an official Web MVC tutorial on the Spring Source site, but it only covered version 2.5.

I had several problems figuring out the base configuration settings, and getting all the correct .jar files in my classpath. webmvc.jar was especially mysterious, because it is not included in the Spring Framework distribution. I ended up finding it via a Google search, but I have yet to find the official download from Spring Source.

Once I got a basic server running, the available documentation for how to actually use Spring seemed pretty good. The Web MVC framework works similar to every other MVC framework you have used. Classes are annotated to turn them into controllers, and controller methods are annotated to turn them into request handlers. You can use either JSPs (Java Server Pages: Java embedded in HTML similar to PHP) or a templating system for your view layer.

The Web MVC stuff is all built on top of Spring's dependency injection framework, so the simple MVC annotations you use are actually shortcuts to lots of complicated XML configuration. Spring's standard DI tools can be used to inject other non-mvc dependencies into your controllers. I used the DI features to configure a simple authentication object (didn't have time to figure out how to get Spring Security working) and also an ORM interface to Google's DataStore (objectify-appengine with the help of objectify-appengine-spring).

Unfortunately, the simple web app I developed didn't require much logic, so it didn't help too much in terms of learning how to code Java, but it was an excellent exercise in getting up and running with Spring. The code for the project is on github. Other new-to-Spring users may find Spring easier to learn by starting with a working app like this one, and learning by modification. I hope to have time to add additional features to the code as I learn how they are accomplished in the Java/Spring ecosystem (unit tests, Spring Security, Javascript framework integration, REST api, templating instead of JSPs).

Wednesday, June 22, 2011

Python Vs Java

After using Python for the past several years, I'm going to be taking on a Java project. I am in the process of learning Java, and I thought I would write up a comparison of the features in each language.

Features Java has that Python doesn't:
  • Static typing
  • Strict access control (package, public, protected, private)
  • Traditional threading implementation wrapped in a decent API
  • Bytecode backwards compatibility 
Features Python has that Java doesn't:
  • Dynamic objects
  • No explicit compile step
  • Properties (transparent getters/setters)
  • List comprehensions
  • Operator overloading
  • Generators (create iterators with the 'yield' statement)
  • Optional keyword arguments, *args, and **kwargs
  • pypi (the cheeseshop) and pip
Static Typing

Static typing has 2 main advantages. Type errors can be caught at compile time, and the compiler can make more optimizations. Statically typed code takes more time to write (creating explicit interface definitions), but you don't need to write manual type checking functions (duck typing) like you often need to in a dynamic language. Static typing also allows you to avoid runtime type errors that you would probably need a unit test to catch with a dynamic language. Javascript and Python both have decent JIT compilers available, so the speed difference between dynamic languages and static languages will continue to narrow.

Strong vs Weak Typing

While static typing may be helpful for some projects, I believe the biggest factor in type usability is not static vs dynamic but strong vs weak. Weakly typed languages allow you to cast objects from one type to another. If you've ever used Perl, PHP, of Javascript you've probably run into some hard-to-debug problems that were caused by implicit casting or automatic type coercion. Languages like these usually have confusing operators like '==='. C doesn't do any implicit casting, but it will let you manually cast in unsafe ways. Java also allows casting, but is safer than C, as it will throw a runtime exception if you try to cast to an incompatible type. On the other hand, Python is strongly typed. There is no such thing as a 'cast' in Python, and there are very few situations where automatic type conversion takes place (arithmetic with operands of different number types automatically convert all operands to the widest type used in the expression). Python's combination of strongly typed objects and dynamic objects with duck typed interfaces is a winner.

Access Controls

Python has no access control modifiers, and instead uses a convention of naming private attributes with a leading under score: '_private_method'. Client code is not 'supposed' to use attributes named with a leading underscore, but there is nothing technical stopping it. I must admit there have been several times when I wish Python had some equivalent construct. Java's 'final' modifier is particularly useful. Considering that many of Python's standard data types are immutable (string, unicode, tuple, frozenset), it's surprising that Python does not offer an easy way to define immutable objects. In Java it's as easy as adding 'private final'.

Threading

Unlike Python and the GIL, Java can utilize multiple cores when executing threads, and it's concurrency interface is wrapped in a nice API. However, I'm not sure how much I'll get to use the threading features. Networking code is increasingly being moved away from threaded implementations to asynchronous/non-blocking solutions such as NodeJS and Twisted, and computation is being distributed on a cluster or 'in the cloud', instead of being run on a single machine.

Other Features

Java lacks many useful features present in Python, and I'm sure I will miss many of them. I hope the list of useful Java constructs grows as I learn more about the language and start working on a production code base.



Sunday, May 29, 2011

Javascript File Browser for Server-Side Files

Problem - Galaxy


Galaxy is a web-based bioinformatics toolkit that allows users to create customized data analysis pipelines. It is becoming an extremely common tool, especially in the sequencing field. The project has one major flaw: it is difficult to get large data files into the system. Uploading 2G files (generated from sequencing runs) through a browser is not a workable solution.


Solution - iRods


iRods is a virtual file system commonly used to transfer and share files in the scientific community. It abstracts the storage details and provides tools for access control, sharing, metadata tracking, file type conversion, and high performance multi-threaded file transfer.

Integration


Myself and co-workers Fred Bevins and Susan Miller were tasked with integrating iRods and Galaxy during the iRods code sprint hosted by iPlant in April. My share of the work was a client-side Javascript file browser. The file browser allows users to browse and select server side files. The version I built talks to the iPlant Foundational API, which exposes iRods directories and files. Other back-ends could easily be added to allow the file browser to talk with a standard file system, or any other file repository. Fred and Susan worked on integrating the file browser into Galaxy, which allows Galaxy users to browse and select files in an iRods repository for use in an analysis. The javascript code is available now, and the Galaxy integration code should be available sometime soon.

Monday, April 4, 2011

Using Protovis to Create Simple Flow Charts

Protovis is a Javascript library for creating SVG graphics to visualize datasets. The API is great, and I've been using it to visualize all sorts of data for a project I'm working on. I had a need to display a very simple (< 15 nodes) branching flow chart. The screen is simple enough that it doesn't justify a custom Protovis layout component or anything fancy like that.

I cooked up a scheme where the nodes are absolute positioned div elements that can be styled with CSS, and the edges are drawn with Protovis. I pass an object that defines the edge properties to a Javascript function that uses JQuery to find the exact positions of the nodes, and then I use Protovis to draw the edges.

Example:



HTML:
<div id="workflowContainer">
  <!--
    -- Draw Simple divs to represent workflow nodes, and connect them with Protovis.
    --
    -- Nodes are positioned absolutely.
    -- Node positions can be static and manually determined,
    -- or dynamic and determined by server-side or client-side
    -- code. This example uses hard coded node positions.
    -->
  
  <div id="workflowChart" >

    <!-- Clickable node -->  
    <a href=""><div id="startFlow" style="top 0; left: 440px;">Start</div></a>

    <!-- Foo branch -->

    <!-- Unclickable node -->  
    <div id="foo1Flow" style="top: 100px; left: 200px;">Foo 1</div>
  
    <a href=""><div id="foo2Flow"  style="top: 175px; left: 100px;">Foo 2</div></a>
  
    <div id="fooChoice1Flow"  style="top: 300px; left: 0px;">Foo Choice 1</div>
  
    <div id="fooChoice2Flow" class="inactive" style="top: 300px; left: 165px;">Foo Choice 2</div>
  
    <div id="fooChoice3Flow" class="inactive" style="top: 300px; left: 360px;">Foo Choice 3</div>
  
    <div id="fooOptionFlow"  style="top: 400px; left: 50px;">Foo Option</div>
  
    <a href=""><div id="fooCombineFlow"  style="top: 500px; left: 200px;">Foo Combine</div></a>
  
    <a href=""><div id="fooSplit1Flow"  style="top: 575px; left: 25px;">Foo Split 1</div></a>
  
    <a href=""><div id="fooSplit2Flow"  style="top: 575px; left: 250px;">Foo Split 2</div></a>
  
    <!-- bar branch -->
    <div id="barFlow" style="top: 100px; left: 700px;">Bar</div>
  
    <a href=""><div id="bar1Flow" class="inactive" style="top: 200px; left: 550px;">Bar 1</div></a>
  
    <a href=""><div id="bar2Flow" class="inactive" style="top: 200px; left: 825px;">Bar 2</div></a>
  </div>
</div>

CSS:
/* Contains both nodes and edges. */
#workflowChartContainer {
 position: relative;
 width: 1000px;
}

/* This is where the edges will be drawn by protovis. */
#workflowChartContainer span {
 position: absolute;
 top: 0;
 left: 0;
 background: transparent;
 z-index: 1000; /* SVG needs to be drawn on top of existing layout. */
}

#workflowChart {
 position: relative;
 top: 0;
 left: 0;
 height: 700px;
 width: 1000px;
}

#workflowChart div {
 border-color: #5b9bea;
 background-color: #b9cde5;
 position: absolute;
 margin: 0;
 padding: 4px;
 border: 2px solid #5b9bea;
 background: #b9cde5;
 border-radius: 4px;
 -moz-border-radius: 4px;
 -webkit-border-radius: 4px;
 color: #000;
 z-index: 10000; /* Needs to be drawn on top of SVG to be clickable. */
}

#workflowChart a {
 cursor: pointer;
}

#workflowChart a div {
 border-color: #f89c51;
 background: #fcd5b5;
}

#workflowChart div.inactive {
 border-color: #ccc;
 background-color: #eee;
 color: #ccc;
}

#workflowChart div:hover {
 border-color: #700000;
}


Javascript:
/* Initialize workflow screen. */
var initWorkflow = function() {
    // List HTML nodes to connect.
    //
    // The edges are hardcoded in this example,
    // but could easily be made dynamic.
    var edges = [
        {
            source: 'startFlow',
            target: 'foo1Flow'
        },
        {
            source: 'foo1Flow',
            target: 'foo2Flow'
        },
        {
            source: 'foo2Flow',
            target: 'fooChoice1Flow'
        },
        {
            source: 'foo2Flow',
            target: 'fooChoice2Flow'
        },
        {
            source: 'foo2Flow',
            target: 'fooChoice3Flow'
        },
        {
            source: 'fooChoice1Flow',
            target: 'fooOptionFlow'
        },
        {
            source: 'fooChoice2Flow',
            target: 'fooOptionFlow'
        },
        {
            source: 'fooOptionFlow',
            target: 'fooCombineFlow'
        },
        {
            source: 'fooChoice3Flow',
            target: 'fooCombineFlow'
        },
        {
            source: 'fooCombineFlow',
            target: 'fooSplit1Flow'
        },
        {
            source: 'fooCombineFlow',
            target: 'fooSplit2Flow'
        },
        {
            source: 'startFlow',
            target: 'barFlow'
        },
        {
            source: 'barFlow',
            target: 'bar1Flow'
        },
        {
            source: 'barFlow',
            target: 'bar2Flow'
        },
    ];
      
    // Us JQUery to set height and width equal to background div.
    var workflow = $('#workflowChart'),
        h = workflow.height(),
        w = workflow.width();
  
    // Create Protovis Panel used to render SVG.
    var vis = new pv.Panel()
        .width(w)
        .height(h)
        .antialias(false);
      
    // Attach Panel to dom
    vis.$dom = workflow[0];
      
    // Render connectors
    drawEdges(vis, edges);
    var test = vis.render();
 };
 
 /* Draw edges specified in input array. */
 var drawEdges = function(vis, edges) {
     // Direction indicators,
     var directions = []; 
 
     $.each(edges, function(idx, item){
         // Color of edges
         var color = '#000';
         
         // Arrow radius         
         var r = 5;
         
         // Use JQuery to get source and destination elements
         var source = $('#' + item.source);
         var target = $('#' + item.target);
         
         if (!(source.length && target.length)) {
             // One of the nodes is not present in the DOM; skip it.
             return;
         }
         
         var data = edgeCoords(source, target);
         if (item.sourceLOffset) {
             data[0].left += item.sourceLOffset;
         }
         if (item.targetLOffset) {
             data[1].left += item.targetLOffset;
         }
         
         if (source.hasClass('inactive') || target.hasClass('inactive')) {
             // If target is disabled, change the edge color.
             color = '#ccc';
         }
         
         // Use Protovis to draw edge line.
         vis.add(pv.Line)
             .data(data)
             .left(function(d) {return d.left;})
             .top(function(d) {
                 if (d.type === 'target') {
                     return d.top - (r * 2);
                 }
                 
                 return d.top;
              })
             .interpolate('linear')
             .segmented(false)
             .strokeStyle(color)
             .lineWidth(2);
         
         // Here you may want to calculate an angle
         // to twist the direction arrows to make the graph
         // prettier. I've left out the code to keep thing simple.
         var a = 0;
         
         // Add direction indicators to array.
         var d = data[1];
         directions.push({
             left: d.left,
             top: d.top - (r * 2),
             angle: a,
             color: color
         });
     });
     
     // Use Protovis to draw all direction indicators
     //
     // Here you may want to check and make
     // sure you're only drawing a single indicator
     // at each position, to avoid drawing multiple
     // indicators for targets that have multiple sources.
     // I've left out the code for simplicity.
     vis.add(pv.Dot)
         .data(directions)
         .left(function (d) {return d.left;})
         .top(function (d) {return d.top;})
         .radius(r)
         .angle(function (d) {return d.angle;})
         .shape("triangle")
         .strokeStyle(function (d) {return d.color;})
         .fillStyle(function (d) {return d.color;});
 };
 
 /* Returns the bottom-middle offset for a dom element. */
 var bottomMiddle = function(node) {
     var coords = node.position();
     coords.top += node.outerHeight();
     coords.left += node.width() / 2;
     return coords;
 };
 
 /* Returns the top-middle offset for a dom element. */
 var topMiddle = function(node) {
     var coords = node.position();
     coords.left += node.width() / 2;
     return coords;
 };
 
 /* Return start/end coordinates for an edge. */
 var edgeCoords = function(source, target) {
     var coords = [bottomMiddle(source), topMiddle(target)];
     coords[0].type = 'source';
     coords[1].type = 'target';
     return coords;
 };

Wednesday, March 16, 2011

PyCon 2011 Report

Here is a presentation covering the status of Python and sessions I attended at PyCon 2011:


Friday, February 18, 2011

SLM (Sample Lifecycle Manager)

SLM

We released the latest version of SLM (Sample Lifecycle Manager) on February 1st, and the site has been a resounding success so far. SLM supports life sciences laboratory services offered by UAGC including:
  • DNA extraction
  • Sanger sequencing
  • DNA fragment analysis (str/microsatellite)
  • Sequenom genotyping
  • Sequenom methylation analysis
  • Taqman genotyping
  • 454 sequencing
  • Ion Torrent sequencing (coming soon)

EAGER

SLM is built with Eager, an application framework for developing custom LIMS. Eager is a collection of Django apps that provide common LIMS functionality including:
  • Workflow management with GLP compliant status logging
  • GLP compliant user and lab access control and management
  • Sample/tube/grid submission and management
  • Volume and concentration tracking
  • Automated sample and reagent dilution and 'cherry picking' transfers
  • Reagent lot tracking
  • Data management and collaboration
  • Integration with SOP management system
  • Environmental monitoring
The core features of Eager can be used 'out-of-the-box' for a complete LIMS solution with a generic sample tracking workflow, or can be customized to provide service specific workflows (such as Sequenom, 454, Ion Torrent, etc.) The framework includes tons of features, and additional workflows can be easily added by an experienced Django developer. Custom workflows are simply custom Django apps that hook into Eager's workflow definition system. All client-side code is written with the Dojo framework.

I am hoping to release the Eager framework on GitHub this spring or summer (it will be the first "open-source LIMS that doesn't suck"), but it currently needs to be reviewed by our IP/legal department first.

Sunday, January 2, 2011

Django ORM Tools

Django ORM Tools

The Django ORM is a great tool that makes it easy to work with simple data models, but it quickly shows its limitations as the complexity of the data model grows. The orm_tools module is an attempt to keep the simplicity of the Django ORM, while adding some extra features that make it much easier to work with complex object graphs. The code is available on Django Snippets.

Object Instances/Sessions

The Django ORM loads each object separately from the database. If different QuerySets select multiple objects with the same primary key, the resulting objects will all be different instances.


>>>MyModel.objects.get(pk=1) is MyModel.objects.get(pk=1)
False


The SQLAlchemy ORM solves this problem with sessions. orm_tools contains a Session class to provide similar functionality. Use the 'with' statement in combination with a Session instance to force QuerySets to retrieve cached object instances from the session.


>>>from orm_tools import Session
>>>with Session():
>>>    MyModel.objects.get(pk=1) is MyModel.objects.get(pk=1)
True


When QuerySet objects are executed inside of the 'with' block, all SQL queries are performed normally, but cached object instances are returned if an instance with an identical primary key already exists in the session. The session applies throughout any code called from within the 'with' block. Any objects inserted into the DB within the 'with' block are automatically added to the session.

Object Graphs

The Django ORM does not automatically save model object dependencies, so Django model instances must be saved one at a time.


>>>parent = MyModel()
>>>child = MyChild(parent=parent)
>>>child.save()
IntegrityError: app_mychild.parent_id may not be NULL


For simple data models, this problem is easily fixed by inserting the models into the database at the same time that they are created.


>>>parent = MyModel.objects.create()
>>>child = MyChild.objects.create(parent=parent)


However this is not always ideal for more complex data models, especially if the objects involved already exist in the database, and changes need to be persisted by updating existing rows. orm_tools contains a GraphSaver class that will save an entire object graph at once.


>>>from orm_tools import GraphSaver
>>>parent = MyModel()
>>>child = MyChild(parent=parent)
>>>saver = GraphSaver()
>>>saver.save(child)


When the 'save' method of the GraphSaver object is called, all dependencies will be detected and their 'save' methods will be called in the correct order, so that the entire object graph is saved. The GraphSaver's 'save' method works equally well for both inserts and updates, although updates can optionally be ignored by setting the 'update' argument to False. In the future, I hope to increase performance significantly by modifying the code to exeucte batched insert/update queries for databases that support it (postgres w/ psycopg 2).

Collections

The Django ORM supports one-to-many object relations. Objects on the 'many' side of a one-to-many relation cannot be attached to the 'one' unless the 'one' is already saved in the database. This causes some of the same problems as described in the 'Object Graphs' section. The orm_tools module contains a Collection class that enables 'many' objects to be added to a 'one' object, regardless of whether the 'one' object has been saved yet.


from django.db import models

from orm_tools import Collection

class One(models.Model):
label = models.CharField(default='blank', max_length=20)

# Call the 'set_property' static method
# to create a collection object.
#
# Arguments
# ==========
# * Model to add collection to
# * Collection attribute name
# * Many's foreign key attribute name
# * One's 'many set' attribute name
Collection.set_property(One, 'children', 'parent', 'many_set')

class Many(models.Model):
label = models.CharField(default='blank', max_length=20)
parent = models.ForeignKey(One, null=False)



>>>one = One()
>>>one.children.add(Many())
>>>one.children.add(Many())
>>>saver = GraphSaver()
>>>saver.save(one)


The Collection object can be iterated through, indexed, and sliced regardless of whether the 'one' object and the 'many' objects have been saved yet. The GraphSaver's 'save' method will also automatically save all 'many' objects.