Polymorphic associations can drive you insane
With apologies to Joseph Stein:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
OK, so maybe I’m over-reacting (actually, there’s no maybe about it), but I’m coming to agree with Bill Karwin: Rails polymorphic associations are an anti-pattern.
Bill’s point is certainly valid: SQL DBMSs have a firm separation between meta-data and data, and any attempt to finesse this separation will cause you to lose some of the significant features of SQL (e.g., foreign key constraints).
But I’ve never been a SQL lover (I still have a soft spot for Quel), and my current environment is all-Rails-all-the-time, so issues around declaring foreign keys and SQL constraints are less applicable: I can rely on Rails features instead of DBMS features.
So let me tell you why they’re an anti-pattern in our case. Imagine that you have a model that uses a non-integer primary key (uh-oh, that’s not Convention):
1 2 3 4 5 6 7
1 2 3
(Github has the working source for this article.)
Imagine you have a few other models that use normal auto-incremented
id fields. For the sake of this
example, let’s use a silly User table:
1 2 3 4 5
Now, since you’re a good open-source Rails coder who doesn’t repeat oneself — nor anybody else you can find on Rubygems or via StackOverflow — imagine you’d like to use the audited gem to audit changes to these tables. We will have two problems and a foreshadowing:
- We’ll have to patch the gem.
- And then the generated SQL will only work for very slow definitions of “work”
- Unless we upgrade out of Rails 3.2
We need to patch the gem
The gem itself will want to create a model like this (taking some liberties with the source):
1 2 3 4 5 6 7 8 9 10 11 12 13
It will also generate associations like:
1 2 3 4 5 6 7
which will immediately fail when we try to save a Case, since the identifier of the Case model is not an integer.
So we modify our generated AuditMigration to make the
auditable_id a String
instead. So far so good.
Through the magic of Convention, this will mostly work: audited records will be written whenever we change a Case or a User.
And any given audit record will be able to get its
auditable object, since
it has a perfectly good table name and id.
But ActiveRecord — at least in Rails 3.2 — is really conventional. And in particular, it doesn’t care what the declared type of a database field is, or even what comes along with a column definition, unless it really has to. It just trusts the objects you pass in to be the right type. So let’s say you want to find the latest version of a Case. You’ll get what you want, since the case id is actually a string:
1 2 3 4
and MySQL will use the
auditable_index since it can see that it has a string value for equality comparison on the
But now you do the same thing from User. Audits hasn’t changed its declaration or anything, but
1 2 3 4 5 6
Looks good, no?
No. Look right after that first equal sign in the SELECT. MySQL (tested with 5.5 and 5.6) apparently sees that equality comparison to 1 as roughly:
That is, it has no idea that you meant
1. And since we’re
effectively transforming every value in the table in order to make the
comparison, MySQL won’t use an index on it.
You can see that by comparing the EXPLAIN PLAN output for both of those queries:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
In our situation, where we have millions of audit entries, that’s a disaster.
I could rail against MySQL here instead of ActiveRecord: MySQL should return an error since you used an integer constant where a string constant is required (and AFAICT, the SQL “standard” [is there any standard more honored in the breach?] requires an error here). But I also think this is another case where Rails’ conventional thinking breaks down as soon as you do anything out of the ordinary: why isn’t it looking at that column definition and casting the constant’s value to a string in Ruby?
What are the alternatives?
Be relational, not object-oriented
Mr. Karwin suggests that the
whole approach of the gem is wrong, since it uses SQL in a non-relational
manner. But finding an alternative is complicated: this isn’t the only
polymorphic assocation on the
Audited model. There are three:
auditableassociation to the object modified
associatedassociation to an object that the auditable object was associated with (to make it easier to find all changes to an object that has many associations, e.g., all order changes for a customer).
userassociation to the party responsible for the change.
Since SQL itself never mixes meta-data with data, all of these are out-of-band from the database’s perspective: the associations only make sense to ActiveRecord, not to general SQL users, visualization systems, etc. In our case, we don’t care — we don’t access this database with anything other than ActiveRecord.
Karwin suggests three alternatives:
- Exclusive Arcs: Add a column for each type, and get rid of the type columns.
That’d work if MySQL could add columns responsibly (i.e., without locking the
table and modifying every record), but MySQL (InnoDB) is just getting around to having
implemented that 1988-vintage Oracle feature (in 5.6 we can avoid the lock but not the
time spent modifying every record). It might also require the
auditedgem to get a bit more involved in the declaration of associations in its audited models. This still relies on non-SQL constraint checking, since for the columns implementing each of the three associations, there can be only one non-NULL value. But foreign keys could be declared, and we still only have one audits table.
- Reverse the Relationship: Create a
table for each combination of types. This has the advantage of allowing proper
SQL constraint checking, since each foreign key can be declared at the database
level. From a query perspective, it’s the alternative of least surprise, since
it also avoids the use of
the other hand, if we want to maintain an order of operation (
version) across all the audits for a specific table, we have to use a multi-table sequence of some sort (no relying on
AUTO_INCREMENT). And every new auditable, association, or user class multiplies the number of migrations we have.
- Concrete Supertable: If I understand this proposal correctly,
the idea would be to create an ‘auditables’ table. Any creation of an auditable
cases) would also create one tuple in
auditables. Then to get the audits for my Case, I join through its
auditableentry to get its audits. I.e.,
:case belongs_to :auditable, and
:auditable has_many :audits. But how would those audits reference the other objects? Hmm. Still reading.
Avoid the ORM
For the audits of integer-keyed models, replace any uses of the ‘audits’
accessor with a method that gives Active record the
id as a string.
Unfortunately, I couldn’t find a way to patch
audited to do this
(overriding the generated association accessor wasn’t reliable: it’d work in
development and production, but not with rspec, for example).
This is the solution we used: we just used a different method name to implement the access we needed.
Fix the ORM
This turns out the be a pretty good answer: In Rails 4.2, Arel generates the correct query for the type of the column:
1 2 3 4 5 6 7 8 9 10 11 12 13
Unfortunately, we can’t easily move the application we have problems with from Rails 3.2 to Rails 4.2.
Don’t treat audits as relational data at all
I’ve been completely avoiding a major issue with the
audited gem: what
exactly does it record about a change in the Case model above?
Well, it records changes to its instances. For any update,
audited records a JSON
string representing a hash (dictionary); that hash has
a key for each changed field, and the value is either a scalar
(for a change from NULL, including creation), or a two-entry array
identifying the previous and new values. How “relational” is that? By
SQL’s conventions, it’s not relational at all. It’s yet another melding
of meta-data with data.
So why use a relational DB to record it? One good reason: to avoid two-phase commit while still maintaining consistency. That is, you want the every change audited, and you want to be certain that every audit represents a real change. As soon as you use a second resource manager (message queue, document store, whatever), you have to deal with making both durable in a single atomic action (or recording reverses on rollbacks).
Other than that, I can’t think of any good reason. But, uh, I’m in the privacy business. Audits are important, so consistency is reason enough.
On the other hand, we can certainly treat the audits as a “big blob” of append-only data, ETL it somewhere, and stop querying it in MySQL at all….