Before working with Django at Kogan I used SQLAlchemy. One of the many features I liked about SQLAlchemy was you had the freedom to join tables on any clause. This is especially useful for when you have a not-quite-normal schema and the data almost matches, for example, matching a denormalised ID from different systems. The […]
Before working with Django at Kogan I used SQLAlchemy. One of the many features I liked about SQLAlchemy was you had the freedom to join tables on any clause. This is especially useful for when you have a not-quite-normal schema and the data almost matches, for example, matching a denormalised ID from different systems.
The trick in Django is to use the undocumented
ForeignObject (the base class of
ForeignKey) which allows for more flexibility when joining. Using the
ForeignObject on its own will attempt to create a new column in the database which we don’t want. Setting
private_only=True will let us use a “virtual column”!
Here’s a snippet we use to set up joins:
class Relationship(models.ForeignObject): """ Create a django link between models on a field where a foreign key isn't used. This class allows that link to be realised through a proper relationship, allowing prefetches and select_related. """ def __init__(self, model, from_fields, to_fields, **kwargs): super().__init__( model, on_delete=models.DO_NOTHING, from_fields=from_fields, to_fields=to_fields, null=True, blank=True, **kwargs, ) def contribute_to_class(self, cls, name, private_only=False, **kwargs): # override the default to always make it private # this ensures that no additional columns are created super().contribute_to_class(cls, name, private_only=True, **kwargs)
This keeps the original column intact (so your production code won’t need a huge refactor) and allows you to prefetch or select_related other attributes off the referenced table.
Here’s a hypothetical situation where you might use this. Say you’re working on a legacy system (because you’d never make these mistakes now!). You’ve got a table
customers which has
customer_number which was generated by an external system. You’ve also got another table
sales which uses
customer_number as a foreign key. Unfortunately,
customer_number might reference something that doesn’t exist in the
customers table as it was dated a long time ago.
CREATE TABLE customers ( customer_number VARCHAR(100) NOT NULL UNIQUE, name VARCHAR(100) ); CREATE TABLE sales ( id int NOT NULL, customer_number VARCHAR(100) NOT NULL, PRIMARY KEY (id) ); INSERT INTO customers (customer_number, name) VALUES ("CUST01", "Steven"), ("CUST02", "Amy"); INSERT INTO sales (id, customer_number) VALUES (1, "CUST01"), (2, "EXT-01");
How could you perform a join on these?
Let’s start with raw SQL.
SELECT id, name FROM sales LEFT JOIN customers ON sales.customer_number = customers.customer_number;
In SQLAlchemy, you could use a custom join condition:
q = session.query(Sales).join(Customer, Sales.customer_number == Customer.customer_number)
In Django, you can now do this:
Class Sale(models.Model): customer_number = models.CharField(max_length=100) customer_reference = Relationship("Customer", from_fields=["customer_number"], to_fields=["customer_number"]) Sale.objects.values_list("id", "customer_reference__name")
We’ve found this to be very useful for our own legacy schemas by being able to optimise blocks where multiple queries were previously necessary.