Thursday, September 06, 2007

Is preloading child tables always a good idea?

Optimization isn't something you should do too early on, but I think a little house cleaning every so often to make sure your pages aren't ridiculously slow is healthy. With any optimization task, you'd want to benchmark the results and see if there's an actual gain. The very basic tool for benchmarking is the ordinary script/performance/benchmark. The easiest to find analysis tools is the rails_analyzer gem. The last time I used rails analyzer, it wasn't that easy to use. The command line arguments seemed arcane. But its bench tool, which can benchmark controllers as opposed to just object models, is fairly easy to use.

Using the bookmarking example from before, let's say you have something like:
class SceneController < ApplicationController
def list
@books = Book.find_books
end
end

class Book < ActiveRecord::Base
def self.find_books
find(:all, :include => [:bookmarks],
:conditions => ["books.created_on > ?", 6.month.ago])
end

def bookmarked_by?(user)
self.bookmarks.select { |bm| bm.owner_id == user.id }.empty? ? false : true
end
end

In the listing of books, one would display whether it's actually bookmarked by a user or not. Normally, without the :include, the listing would make repeated queries to the DB every time it displayed a book list element, since it will use bookmarked_by?(user_id) to determine if a user bookmarked the book. So instead of just 1 query, it would make n + 1 queries.

Preloading child tables isn't necessarily wise all the time. It really depends on what you intend to do with the data after you fetch it. As the Agile rails book warns, preloading all that data will take time. If you look at your log files, you'll see that it's a significant amount.

If you're only going to load a limited number of these book list elements on a single page at a time, it actually might make sense to forgo preloading of child tables, and just use a find() instead of a select.
class SceneController < ApplicationController
def list
@books = Book.find_books
end
end

class Book < ActiveRecord::Base
def self.find_books
find(:all, :conditions => ["created_on > ?", 6.month.ago],
:limit => 20, :order => "created_on desc")
end

def bookmarked_by?(user)
Bookmark.find(:first,
:conditions => ["book_id = ? and owner_id = ?", id, user.id]) ? true : false

end
end

And if you're going to display counts of arrays, but all means, use counter caching. It's easy to do (as long as you follow instructions!), for most situations.

Intuitively, if you want to display over a certain n number of book list elements, it makes more sense to use :include and select it. However, I wanted to point out that when you make decisions like this, you'll always want to measure the load times, because you earn what you measure.

Also, use the right number of runs. Too short number of a number of times you run a function, the more variation you'll have in your benchmarks. Let's say that you get two numbers for two different methods.
$ bench -u http://localhost:3000/method1 -r 50 -c 5
50....45....40....35....30....25....20....15....10....5....
Total time: 240.383527755737
Average time: 4.80767055511475

$ bench -u http://localhost:3000/method2 -r 50 -c 5
50....45....40....35....30....25....20....15....10....5....
Total time: 156.147093772888
Average time: 3.12294187545776

So it's obvious that method2 is better right? Well, not necessarily. While benchmarks only show averages, you'll need to pay attention to standard deviations. The bigger the standard deviation, the more runs you'll need to figure out the average load time, and the number of decimal points you can trust. That way, you can figure out whether the difference in load times is statistically significant or not.

That way, you can ascertain whether the optimization you made were worth the trouble or not. tip!

No comments:

Post a Comment