Sunday, January 2, 2011

Django ORM Tools

Django ORM Tools

The Django ORM is a great tool that makes it easy to work with simple data models, but it quickly shows its limitations as the complexity of the data model grows. The orm_tools module is an attempt to keep the simplicity of the Django ORM, while adding some extra features that make it much easier to work with complex object graphs. The code is available on Django Snippets.

Object Instances/Sessions

The Django ORM loads each object separately from the database. If different QuerySets select multiple objects with the same primary key, the resulting objects will all be different instances.


>>>MyModel.objects.get(pk=1) is MyModel.objects.get(pk=1)
False


The SQLAlchemy ORM solves this problem with sessions. orm_tools contains a Session class to provide similar functionality. Use the 'with' statement in combination with a Session instance to force QuerySets to retrieve cached object instances from the session.


>>>from orm_tools import Session
>>>with Session():
>>>    MyModel.objects.get(pk=1) is MyModel.objects.get(pk=1)
True


When QuerySet objects are executed inside of the 'with' block, all SQL queries are performed normally, but cached object instances are returned if an instance with an identical primary key already exists in the session. The session applies throughout any code called from within the 'with' block. Any objects inserted into the DB within the 'with' block are automatically added to the session.

Object Graphs

The Django ORM does not automatically save model object dependencies, so Django model instances must be saved one at a time.


>>>parent = MyModel()
>>>child = MyChild(parent=parent)
>>>child.save()
IntegrityError: app_mychild.parent_id may not be NULL


For simple data models, this problem is easily fixed by inserting the models into the database at the same time that they are created.


>>>parent = MyModel.objects.create()
>>>child = MyChild.objects.create(parent=parent)


However this is not always ideal for more complex data models, especially if the objects involved already exist in the database, and changes need to be persisted by updating existing rows. orm_tools contains a GraphSaver class that will save an entire object graph at once.


>>>from orm_tools import GraphSaver
>>>parent = MyModel()
>>>child = MyChild(parent=parent)
>>>saver = GraphSaver()
>>>saver.save(child)


When the 'save' method of the GraphSaver object is called, all dependencies will be detected and their 'save' methods will be called in the correct order, so that the entire object graph is saved. The GraphSaver's 'save' method works equally well for both inserts and updates, although updates can optionally be ignored by setting the 'update' argument to False. In the future, I hope to increase performance significantly by modifying the code to exeucte batched insert/update queries for databases that support it (postgres w/ psycopg 2).

Collections

The Django ORM supports one-to-many object relations. Objects on the 'many' side of a one-to-many relation cannot be attached to the 'one' unless the 'one' is already saved in the database. This causes some of the same problems as described in the 'Object Graphs' section. The orm_tools module contains a Collection class that enables 'many' objects to be added to a 'one' object, regardless of whether the 'one' object has been saved yet.


from django.db import models

from orm_tools import Collection

class One(models.Model):
label = models.CharField(default='blank', max_length=20)

# Call the 'set_property' static method
# to create a collection object.
#
# Arguments
# ==========
# * Model to add collection to
# * Collection attribute name
# * Many's foreign key attribute name
# * One's 'many set' attribute name
Collection.set_property(One, 'children', 'parent', 'many_set')

class Many(models.Model):
label = models.CharField(default='blank', max_length=20)
parent = models.ForeignKey(One, null=False)



>>>one = One()
>>>one.children.add(Many())
>>>one.children.add(Many())
>>>saver = GraphSaver()
>>>saver.save(one)


The Collection object can be iterated through, indexed, and sliced regardless of whether the 'one' object and the 'many' objects have been saved yet. The GraphSaver's 'save' method will also automatically save all 'many' objects.