Monday, April 09, 2007

Updating just the join table

Having a model that has a has_and_belongs_to_many relationships with another model affords you the convenience of a bunch of added on methods that get created when you define the relationship. These are all pretty nice. But I found that I had to forgo these methods for a more crude method.

Let's say you have two models, taken from the Rails book: Article and User.

class Article < ActiveRecord::Base
has_and_belongs_to_many :users
end

class User < ActiveRecord::Base
has_and_belongs_to_many :articles
end

In order to create a new article and associate it to a user right away, you can use create!:
user = User.find(session[:user].id)
user.articles.create!(:title => "The Art of FizzBuzz")

But sometimes, an article might be linked to other models as well. Let's say that there's a Shelf model, and an Article habtm Shelves too. Then, you'd have to pull something like:
user = User.find(session[:user].id)
shelf = Shelf.find(params[:shelf_id])
article = user.articles.create!(:title => "Go and foobar yourself")
shelf.articles << article


Now, that last line is tricky. It's adding the new article to the articles of a shelf. Technically, it should just be inserting ids in the join model. However, that's not the case. It will ask shelf to load all its articles first, and then update the join table. Now, if you're going to manipulate articles of that shelf later on in the controller method, I think this would be the way to go.

However, if you're importing articles from the net, that might not work so well. In that case you just needed to add the association in articles to shelves in the join table. The current implementation of <<, concat, and push seems to enforce an explicit query for it at least once.

Therefore, if "shelf" has a lot of articles, then you'll experience a large slowdown in importing your articles--for every new article, you're asking the database to return a list of all current articles on that shelf. Database caches common queries, but in this case, it doesn't help, since you're importing a new article every time, which can belong to different shelves. But the time you come back to the same shelve, it may have been cleared from the cache already.

This is very much like Joel's story about Shlemiel the Painter. It's not that <<, concat, push is implemented poorly, but that it's used for a different scenario with different assumptions--that you're going to be doing other things to the collection within the scope of the controller method.

The only solution I've come up with is an ugly one. I created a model out of the join table, and added a method called link. It finds the associated link, and if it doesn't find one, it creates it.
class ArticlesShelves < ActiveRecord::Base
def self.link(article, shelf)
find_by_article_id_and_shelf_id(article.id, self.id) ||
create!(:article_id => article.id, :shelf_id => shelf.id)
end
end


This has lowered the importing of articles from a minute and a half for each article belonging to a shelf with lots of articles, to about 0.5 second for each article on a low powered machine. I personally don't like this solution, since it introduces a very specialized model object with only one purpose, rather than a cohesive set of responsibilities.

While it is possible to push the method "link" to both Article and Shelf, I'm not sure exactly how to query for just the join table if the active record counterpart ArticlesShelves does not exist, other than using find_by_sql(). But even then, how do you execute an "insert" SQL query?

If you've got a better solution, let's hear it. :)

No comments:

Post a Comment