limscoder: 2010

Saturday, October 16, 2010

Apache, Virtual Hosts, and HTTPS

Apache cannot use https with name-based virtual hosts due to the way the SSL handshake works. I've run across this problem several times in the past, and I always forget how to solve it. So I'll record it here for posterity.

To get things working, the Apache setup needs to be changed from name-based virtual hosting to ip-based virtual hosting. After configuring a separate ip for each vhost that requires https, the Apache config files (/etc/httpd/conf/ on RHEL, Apache 2.2) need to be updated to use ip-based virtual hosting.

If name-based vhosting was previously configured, it will need to be modified. If all vhosts are being converted to ip-based vhosting, then name-based vhosting can be completely turned off by commenting or deleting any 'NameVirtualHost' directives. However, it is also possible to continue to use name-based vhosting for vhosts that do not require https. Any existing 'NameVirtualHost' directives that contain wildcards ('NameVirtualHost *:80') will need to be modified. Replace the wildcard with the ip that will be shared by name-based vhosts.

Next, modify any existing 'VirtualHost' directives that contain wildcards in their definition ('VirtualHost *:80'). Replace the wildcard with the ip that the vhost will be using. Virtual hosts that do not require HTTPS can continue to use name-based virtual hosting, and can share the same ip, but all vhosts that require HTTPS must use a unique ip address.

Finally, configure a 'VirtualHost' directive for each ip-based vhost in the ssl section of the Apache configuration file ('/etc/httpd/conf.d/ssl.conf' on RHEL, Apache 2.2). Any name-based vhosts will continue to share the ssl config within the '_default_:80' 'VirtualHost' directive. Restart Apache for the changes to take affect.

Saturday, September 4, 2010

Transactions for File Transfer

Transactions

Database transactions are a convenient way to maintain consistent state during data processing functions. If an error occurs during processing, just rollback the transaction to avoid incomplete or incorrect data being stored.

Problem

I've worked on many problems where data processing involves retrieving a source file, performing some type of processing, and then writing to a destination file. These functions are tricky, because if a problem arises during the processing, you're left with an inconsistent, partially processed batch of files. This problem is especially pronounced if you're storing file metadata in a database. If you perform a rollback of your database transaction when an error occurs, then you've lost any updated metadata about the files that were processed correctly.

Solution

In an attempt to remedy this problem I've developed a somewhat naive implementation of a file transaction class that can be used to maintain consistent state during processing function involving many files. The transaction object keeps track of all files that have been created and all files that should be deleted. All files marked for deletion are deleted when a commit occurs. All files marked as created are removed when a rollback occurs. If a file needs to be moved, it is instead copied, and the source file is marked for deletion, and the destination file is marked as being created.

Implementation


import glob
import os
import shutil

class Transaction(object):
    """
    Manages transactions for file storage.

    Assumes each file is only being operated on by one person at a time.

    If multiple users try to operate on the same file, then the last
    to access gets an exception.
    """

    lock_postfix = 't_lock'

    def __init__(self):
        self._level = 0

    def _get_lock_path(self, path):
        """Return lock file path."""

        if path.endswith('/'):
            end = len(path) - 1
            path = path[:end]

        return path + '.%s' % self.lock_postfix

    def _set_files(self):
        """Resets file lists."""

        self._files_added = set()
        self._files_removed = set()
        self._dirs_added = set()
        self._dirs_removed = set()
        self._locked_files = set()

        # Unlike the other types,
        # move operations
        # must be ordered!!
        self._files_moved = []

    def _check_level(self):
        """Raises exception if level is not 1 or above."""

        if self._level < 1:
            raise exceptions.TransactionError('Transaction not active.')

    def _rm(self, file_paths, dir_paths):
        """Remove all files."""

        for dir_path in dir_paths:
            if os.path.exists(dir_path):
                shutil.rmtree(dir_path)

        for file_path in file_paths:
            if os.path.exists(file_path):
                os.unlink(file_path)

    def _rev_moves(self):
        """Reverse moved files."""

        for move in reversed(self._files_moved):
            shutil.move(move[1], move[0])

    def _acquire_lock(self, path):
        """Attempt to lock a file."""

        # Make sure transaction is started
        self._check_level()

        if path not in self._locked_files:
            # Create lock file on file system
            lock_path = self._get_lock_path(path)
            if os.path.exists(lock_path):
                # Multi-user access is not allowed!
                raise exceptions.TransactionError('File is locked.')
            out_file = open(lock_path, 'w')
            out_file.write('\n')
            out_file.close()
            self._locked_files.add(path)

    def _release_lock(self, path):
        """Release a lock file."""

        lock_path = self._get_lock_path(path)
        if os.path.exists(lock_path):
            os.unlink(lock_path)
        self._locked_files.discard(path)

    def _release_locks(self):
        """Release all locks."""

        locked_paths = self._locked_files.copy()
        for path in locked_paths:
            self._release_lock(path)

    def copy_file(self, src_path, dest_path, remove_existing=False, directory=False):
        """Copy a file. Set remove_existing to True to move file."""

        if directory is True:
            shutil.copytree(src_path, dest_path, symlinks=True)
        else:
            shutil.copyfile(src_path, dest_path)
        self.add_file(dest_path, directory=directory)

        if remove_existing is True:
            self.remove_file(src_path, directory=directory)

    def add_file(self, path, directory=None):
        """Add a file to the transaction."""

        self._check_level()

        self._acquire_lock(path)

        if directory is None:
            directory = os.path.isdir(path)

        if directory is True:
            self._dirs_added.add(path)
        else:
            self._files_added.add(path)

    def remove_file(self, path, directory=None):
        """Remove a file from the transaction."""

        self._check_level()

        self._acquire_lock(path)

        if directory is None:
            directory = os.path.isdir(path)

        if directory is True:
            self._dirs_removed.add(path)
        else:
            self._files_removed.add(path)

    def move_file(self, src_path, dest_path):
        """Move a file from one location to another."""

        self._check_level()

        self._acquire_lock(src_path)
        self._acquire_lock(dest_path)

        shutil.move(src_path, dest_path)
        self._files_moved.append((src_path, dest_path))

    def begin(self):
        """Begin transaction."""

        if self._level == 0:
            self._set_files()

        self._level += 1

    def commit(self):
        """Removes all 'removed' files and dirs."""

        self._check_level()

        self._level -= 1
        if self._level == 0:
            self._rm(self._files_removed, self._dirs_removed)
            self._release_locks()

    def rollback(self):
        """Removes all 'added' files and dirs."""

        self._check_level()

        self._level -= 1
        if self._level == 0:
            self._rm(self._files_added, self._dirs_added)
            self._rev_moves()
            self._release_locks()

Example


def process():
    transaction = Transaction()
    transaction.begin()
    try:
        # Mark a file as created
        transaction.add_file(new_file)

        # Mark a file as deleted
        transaction.remove_file(delete_file)

        # Copy a file
        transaction.copy_file(src_file, dest_file)

        # Move a file
        transaction.move_file(mov_src_file, mov_dest_file)
        transaction.commit()
    except:
        transaction.rollback()
        raise

Limitations

The class only works for single user environments. A lock file is created for every file added to a transaction. If a different transaction tries to acquire a lock for a file that is already locked, an exception is raised. Negotiating multi-user access would be quite tricky, especially in the case of delete files, where the file no longer exists after the lock is released.

Tuesday, June 1, 2010

AmFast 0.5.1 Released

AmFast 0.5.1 has been released. This is a bug-fix release and can be downloaded from PyPi.

Monday, May 10, 2010

Python Workshop

It's that time of the year again. Time to learn Python! I will be teaching a Python tutorial May 24-28. The tutorial will cover Python for users with little or no programming experience. Examples are geared toward biologists, but most concepts are applicable to all. Critiques and comments of the workshop tutorial material are greatly appreciated.

Register for the workshop.

Thursday, April 22, 2010

Django + Dojo

When I work on HTML projects, I usually use the Dojo Toolkit for my Javascript needs. Lately I've been spending some time playing around with the Python web framework Django. I did some internet searching and found Dojango, a project that integrates Django with Dojo. Dojango has features for automatically turning Django form fields into Dijits (Dojo UI widgets), but unfortunately Dojango uses Dojo's custom HTML attributes with Dojo's parseOnLoad option. I prefer to create Dijits programatically so that my markup stays clean. I decided to develop a Django app to meet my needs.

Code is available from SVN. Instructions are below.

Instructions:

Settings

# Setup app in settings.py

# Required attributes:

# The URL to get dojo.js from
DOJO_URL = MEDIA_URL + 'js/dojo'

# Optional attributes:

# Set to True to add Dojo setup to template
DOJO_ENABLED = True

# Set Dojo theme
DOJO_THEME = 'tundra' 

# Set the value of djconfig
DOJO_DJCONFIG = {'isDebug': False, 'parseOnLoad': False, 
                 'modulePaths': {'app': MEDIA_URL + 'js/app'}}

# More on this later
DOJO_FORM_FUNCTION = None

# Attach middleware
MIDDLEWARE_CLASSES = (
    'dojo.middleware.DojoMiddleware',
    ...
)

The dojo object

The middleware attaches a Dojo object to each request. You can access the object from your views, and use it to set Dojo parameters.

# The dojo object has several useful attributes and methods.

# The path to dojo.js (from settings.DOJO_URL), read-only
request.dojo.src

# Theme
request.dojo.theme = 'soria'

# DjConfig
request.dojo.dj_config['isDebug'] = True

# Convenience method to set module paths in dj_config
request.dojo.set_module_path('custom', 'url_to_custom_module')

# Add stylesheets
request.dojo.append_stylesheet('url_to_custom_stylesheet')

# Require modules
request.dojo.append_module('module.to.require')

# Set function to addOnLoad
request.dojo.append_aol('function() {do_something();}')

# Set function to addOnLoad before other already set
request.dojo.prepend_aol('function() {do_something();}'

Forms

Django forms can easily be 'dijitized'.

from dojo import models as dojo

class Register(forms.Form):
    username = forms.RegexField(
        label='Choose a username (letters and numbers only)',
        min_length=2,
        max_length=16,
        regex=r'^[\w]{2,16}$',
        error_messages={
            'invalid': 'Username must be 16 characters or shorter, and can only' 
                       ' contain letters, numbers, underscores and dashes.'
         }
    )

    # Use the dojo_field function to attach dijit
    # parameters to a Django form field.
    dojo.dojo_field(username, 'dijit.form.ValidationTextBox')



   """
   dojo_field arguments

   required
   ==========
    * field - Django field to attach dijit parameters to.
    * dojo_type - str, the qualified name of the dijit class to use

   keyword
   =========
    * attr_map - dict, Used to map Django field parameters to Dijit parameters.
                 Overrides the default values in dojo.models.default_attr_map.
                 The dict elements should be structured as follows:

                 Key == Django attribute name
                 Value == tuple with elements:
                     [0] == Dijit attribute name to map to
                     [1] == None, or callable to convert Django value to Dijit value

                 EXAMPLE:
                 {
                     'max_length': ('maxLength', None),
                     'regex': ('regExp', lambda a: '%s' % a.pattern)
                 }
    * dojo_attrs - dict, Attributes will be applied directly to dijit.
                   Key == dojo attribute name
                   Value == dojo attribute value
   """
}

After creating a form, it must be instrumented to create the Javascript required to create the dijits.



def my_view(request):

    my_form = Register()

    # This call generates all the necessary Javascript code
    request.dojo.dojo_form(my_form)

    # By default, the function code generated
    # is a string to be added in-line.
    #
    # If you prefer to call a pre-defined JS function,
    # just set the request.dojo.form_function attribute.
    #
    # The value of the attribute should be a tuple where:
    # [0] == qualified Dojo module name where function exists
    # [1] == function name
    #
    # request.dojo.form_function can also be set automatically
    # by setting DOJO_FORM_FUNCTION in settings.py

Template

Include the following tags within the 'head' tag of your HTML template:


{% load dojo %}
{% dojo request.dojo %}

The Dojo app also includes a script for creating a Dojo build:


python manage.py dojo_build

Saturday, March 6, 2010

UI Design

In the world of scientific software, UIs tend to be horrifically awful. When we started working on our next generation LIMS (called SLM: Sample Lifecycle Manager), one of our goals was to provide an intuitive UI. This post covers some of what we went through on the UI design front.

First Try

After evaluating several Javascript libraries, we decided that Flex would be a good choice, because it would be easy to create a desktop-like application to manage the large feature set. The main screen for our 1st alpha release looked something like this:

The design is similar to a complex desktop app such as Thunderbird or Eclipse. The app has a menu bar, a left panel with data/navigation, a bottom panel with several features, and a main tabbed panel with content.

Our users hated it. We quickly learned that our users don't want desktop-style apps with many features spread across multiple panels. Instead, they are looking for web-style apps where each function of the application can be performed on a single, simple screen, and links are used to navigate between feature screens.

Second Try

Our revised UI eliminated the menu bar. The left and bottom panels are still available for power users, but they are now hidden by default. These simple changes dramatically improved the user's acceptance of the app.

Forms

Flex's form system is great for building complex forms (as long as you can live with the standard theme). Flex has built-in validator classes that can provide real-time visual feed back to users as they enter information into the form. It is very important to customize your validators to provide useful messages to your users, so they know how to correct their input.

Always make sure your forms are fully navigable by keyboard, and make sure that by default, the form will be submitted when the user hits 'Enter'.

SLM has many features, and some of the forms rely on optionally selected fields. For example, if a user selects an option from a drop-down list, then other fields in the form become enabled or disabled. Many desktop-style apps 'grey-out' the disabled fields (this can be done using Flex UIComponent's 'enabled' property), but we found that un-usable fields that remain visible can be confusing for users. Instead, adding or removing the optional fields feels much more intuitive.

Visual Cues

Animations are not just eye-candy, they can also help users navigate through the system. For example, SLM has several drag-and-drop features. At first, some users were unsure about which items could be dragged into a drop-zone. To fix this problem we added a simple glow animation to highlight draggable items whenever the user's mouse is hovering over the drop-zone.

Trust

Customers use SLM to submit samples for testing at our facility. The submission process is similar to a shopping cart check-out on a retail site. It is extremely important to allow a user the ability to move forward and backward through the process. User's don't feel confident unless they trust the system to move back and forth between steps without munging any data they've already entered. If a user needs to change an option in step 2, it shouldn't affect the data they've already entered in step 3. Users must also be confident that nothing will be stored on the server until the final submit step.

For single step operations, the UI must always provide a 'Cancel' option or some other way for a user to back-out of the changes they've made.