Web Jazz: September 2007

Saturday, September 29, 2007

Mushkin - Get More

Mushkin - Get More

If you follow my blog, it's really not often that I give plugs for products or services. But sometimes, when your experience is just overwhelming positive, you just have to give credit where credit is due.

Three years ago, I brought a Dell Dimension 2400. It's a 2.66GHz P4 computer--paltry by even by yesterday's standards, but it was cheap...about $500. I brought it on a whim, and since then, it's become my main development machine. However, I had only brought it with 512MBs of RAM...which I thought was more than enough at the time.

Enter Firefox 2 stage left. The elephant that easily eats up 230MBs.

Like I lamented before, I've been having productivity slow-downs due to applications using an extraordinary amount of memory. I was actually watching screens redraw itself because the swap partition was being exercised like crazy. I had sworn off Eclipse because I felt that for what I used it for, it was eating up too much memory. But at the end of the day, what room was left by Eclipse was taken up by Firefox 2.

In the end, I succumbed and went out to buy more RAM for my machine while waiting for Firefox 3. But it's been so long (2 years) since I brought hardware, I didn't know what to get. Kudos to Dell for providing easy ways to look up what RAM you need by its model number. I use to revel in being able to look up specs to find the lowest price, but I simply don't have the time anymore. Dell was offering 1GB RAM for $109. Seemed reasonable. But I decided to check Mushkin.

I've used Mushkin since my sophomore year in college (8 years ago!), and none of the RAM I've ever purchased from them have ever failed so far. Believe me, RAM fails. Probably not as fast as hard drives, but RAM fails. And when it does, it's the last thing you assume that has failed. As a result, you waste so much time in your dorm room debugging it instead of out there mixing it up with the co-eds.

I revisited mushkins for the first time in a long time, and I have to say that while they have a dizzying array of RAM from hardcore hardware enthusiasts (what I use to be), they also have easy-to-find components for owners of stock computer models (like for me now). Not only that, they offer the same type of RAM for $30's less. Tres awesome.

Normally, if I just saw Mushkin out of the blue, I don't know if I'd trust them. But having used them all these years, and given that their RAM keeps on ticking, they stand by their quality, however they do it. Not only is their stuff quality, but their shopping experience is a breeze.

So I'd recommend their stuff. If I get Mobtropolis to a point where I need huge servers, I know where I'm getting my RAM.

Ok, next time, will be more coding related stuff, as I have some interesting things to post that have been gathering in the queue. Stay tuned!

Wednesday, September 26, 2007

[115, 117, 109, 109, 97, 114, 105, 122, 97, 116, 105, 111, 110].map{|c| c.chr}.join

Time does fly by when you don't work on the weekend. It's been almost a week since my last post--so you know I've been busy coding. Today, I got stuck on caching, so here I am, blogging.

In Nerd Time 8, I had mentioned the algorithm on content-aware image resizing. For those of you that didn't hear about it a couple weeks ago, watch the movie. It seems pretty magical at first. It basically computes an energy function for the image to decide which part it can cut out if it needed to.

I don't know if the rest of you had the same thought, but content-aware image resizing is essentially image summarization. You're throwing out less important information in the picture in favor of preserving informational features of the image.

They use an energy function as a metric to determine which seams--and in what order--to remove from the picture to reduce its size.

The most surprising thing to me was that the basic energy function is just the magnitude of the gradient function. A gradient function of an image tells you how fast the colors are changing as you're moving across the image. This means the sky would be smooth and slowly varying (low frequency), and the trees would be rough and varies quickly (high frequency). Therefore, the basic gradient energy function just allows you to selectively cut out the low frequency parts of the image while the seam selection preserves the aspect ratio and image coherence.

Apparently, this metric works pretty well, even compared to other metrics like entropy, which is the standard measure of information content. This works mostly because of the assumption that high content areas of the scene will be high frequency, and background images, like sky, road, wall, are generally low frequency images. If you had a picture of me and someone else holding up a flag with a forest as a backdrop, it'll cut out the flag first, not the trees, using the gradient energy function.

This puts text summarization into clearer focus for me. There are two competing goals in text summarization: 1) reduce the amount of text 2) keep the information content high and coherent. With content-aware image resizing, it was able to achieve both goals by finding a metric that was calculable to distinguish between important and non-important. So by comparison, we should be able to do the same with text.

However, we don't know what, if any, the gradient between words means, and how that would fare as a measure of information content. We also don't have a good way of judging coherency of a piece of text--different people will come up with different summaries. In an image, we can look and just tell. This is because we judge all pieces of an image in parallel, and we have a database of images to compare to in our heads to tell if something 'looks right' or not.

One can tell how far apart a color is from another simply by measuring the distance of the hex values that represent that color. However, words that have similar letters may have completely different meanings.

The difference between the almost right word & the right word is really a large matter--it's the difference between the lightning bug and the lightning. - Mark Twain

I suspect that one would need to use a gradient map for words, or be able to use the etymology of words to measure how far apart the meanings of words are from each other. How to generate this map has been difficult, as far as I know.

Many people have used co-occurrences of words to map words to meanings, since it makes sense that words related to each other would appear in the same text. However it was found that even if two words had the same meaning, they might have different frequency of occurrences, thus throwing off the validity of the gradient map.

Wednesday, September 19, 2007

Diving into Rails source and explaining alias_method_chain in pictures

I've been involved with the local Ruby group, and I did a talk on diving into the Rails source code. I have to admit, I probably should have dove into it earlier, but there was a lot to learn just by using Rails, for someone that was doing engineering work before this. Originally, I was going to just relate my experience of when I had to dive into the code, but then I happened on Rails' Unusual Architecture, and it gave a good topic for introducing metaprogramming used in Rails.

At first, it was pretty confusing, since I came from a C++ background and was use to the idea of design patterns. Well, there are patterns, it's just that in dynamic languages, much of the traditional Gang of Four design patterns goes away.

I'll outline the meat of the talk. Rails doesn't use the Decorator Pattern. Instead it basically renames methods in calls to inject additional functionality that wraps around the original functionality. How does it do this? It uses something called alias_method_chain, detailed here and here.

And inside alias_method_chain, is a native method called alias_method. Since everything in Ruby is an object, and changeable, that means methods of modules and objects are also changeable. It basically makes a copy of the old method and calls it something new.

I made pictures to show the progression. Let's say we have a method called "save" that we want to enhance with "validation".

We start with ActiveRecord and the Validation module. So in our validation module, we make a method called "save_with_validation"(purple heart) that calls a method called "save_without_validation" inside it. "save_without_validation" doesn't exist yet, because we haven't called alias_method_chain().

Then our first step is to include the Validation module inside of ActiveRecord.

Then when we call alias_method_chain(:save, :validation), using alias_method, it'll make a copy of the original save method, and call it "save_without_validation".

Then, it'll rename the method inside of validation from "save_with_validation" to "save". This way, any code calling save() on ActiveRecord will execute the new save, which does validation first, and then turns around to call the original save (green sun). The client won't know any different, but in fact, code was injected between the original save (green sun), and the caller inside of the new "save" (purple heart). And the original save doesn't know any different either, since nothing's changed from its point of view.

In code, it'll look something like this:

class ActiveRecord
  def save
    # do saving stuff
  end
end

module Validation
  def save_with_validation
    # do validation
    save_without_validation
  end
end

class ActiveRecord
  include Validation
  alias_method_chain :save, :validation
end

Which if we 'executed' the above as we did in pictures, it ends up being equivalent to:

class ActiveRecord
  def save_without_validation
    # do saving stuff
  end

  def save
    # do validation
    save_without_validation
  end
end

I'm not sure how I feel about it as of yet, but at first glance, it's a nice pattern once you know what's going on. At first, I saw all these methods being called in the Rails source which aren't actually defined anywhere. It ends up it's because of metaprogramming stuff like this going on. I think other people have said this is bad idea because it's hard to inject functionality in the middle of chains that already exist. If you pick up one, you pick up everything before it. One might argue the same is true of Decorator patterns.

In any case, you'll see this repeated over and over again in the Rails source, so hopefully, this'll give you some idea of what's going on if you ever decide to go Rails splunking.

Sunday, September 16, 2007

Syntactic sugar for dealing with empty containers

In any web application, we're often just reading a collection of rows from the database and displaying it in the browser. Often times, we'll have code that looks like this:

<% unless @friends.empty? -%>
  <% @friends.each do |friend| -%>
    <li><%= h friend.username %></li>
  <% end -%>
<% else -%>
  No friends yet
<% end -%>

I don't know why, but this kinda gets to me, and doesn't look all that neat. I probably attribute it to having to upgrade and maintain a piece of C server code that was nested 8 or 9 layers deep all in one huge main(). It might be counter-productive, but I tried to see if I could do better.

<% if @friends.each do |friend| -%>
  <li><%= h friend.username %></li>
<% end.empty? -%>
  No friends yet
<% end -%>

Well, this is kinda nice in a way that it's only one hierarchy deep. When I look at it, one section is for what to display when there are elements in the list, and one is for when there isn't. I suppose your mileage may vary. However, I didn't like the "if" in front. It obscures the intent of displaying the list. So, in the pursuit of more counter-productivity and perhaps in the spirit of pseudo-altering the language, I tried this out:

<% @friends.each do |friend| -%>
  <li><%= h friend.username %></li>
<% end.empty do -%>
  No friends yet
<% end -%>

Well, that worked. I kinda like it. Since the message was so simple, I had wanted empty() to take a message, and just display it, but because it's a "%" and not a "%=", the message won't get displayed, so I had to do it in a block. In a way, it's almost like being able to write my own "else" statement. If I had used curly braces instead of "do/end", it might look pretty close. Here's the code for empty:

class Array
  def empty(message = "")
    if self.empty?
      return block_given? ? (yield message) : message
    end
  end
end

Like it? Hate it? Tip!

Thursday, September 13, 2007

Unable to freeze rails due to problem in rake task

I think the current version of stable Rails is 1.2.3. For those of you using this version, rather than Edge Rails, there's a little gotcha in the rake tasks.

Since I'm on a shared host, it's good practice to freeze your version of rails into the vendor's directory. You do this by using a rake task, per "rake rails:freeze:gems" But before you do that, if you're using SVN, you'll want to use "svn delete" to remove the vendors/rails directory. None of the rake tasks use SVN delete. They all use "rm -rf", which in my experience makes SVN freak out if the .svn directory is gone.

However, even with that done, freezing a new version of gems was failing.

It was looking for rails version 1.4.0, and not being able to install it. And even worse, when you try to run rake again, it said it couldn't find it!

Well, the latter was simple. A failed freeze leaves a blank vendor/rails directory, and if you look in 'config/boot.rb', it says:


if File.directory?("#{RAILS_ROOT}/vendor/rails")
  require "#{RAILS_ROOT}/vendor/rails/railties/lib/initializer"
else
  require 'rubygems'
  ...blah blah blah..

So make sure you remove vendor/rails.

The latter took a little bit of work digging around the rake tasks, and though it wasn't hard, I wasted about an hour. It ends up that the culprit is that the default rake task uses Gem.cache.search('rails'), which returns all gems with the name 'rails' in it.

I have a couple gems installed with the word 'rails' in it.


rails (1.2.3, 1.2.0, 1.1.6)
rails_analyzer_tools (1.4.0)
railsbench (0.9.2)

So it took the latest one, which was 1.4.0, and tried to install rails 1.4.0, which doesn't exist!

To hot fix it, the railities/lib/tasks/framework.rake file, under the freeze namespace and gems task, change "Gem.cache.search" to "Gem.cache.find_name"

That way, it only finds 'rails', and not all the other games with 'rails' in the name of the gem. This problem is solved in edge Rails, so no need to submit a patch. Tip!

Tuesday, September 11, 2007

Nerd time, issue 8

I used to work at a research laboratory, where the fanfare and fads of the web aren't of great concern. But the co workers liked to know what was going on outside the ivory towers, so after I quit, I sent them an informal mailing list. I post it on here just for fun too.

---

So after a little hiatus deploying mobtropolis, nerd time is back. As
usual, easy reading is up top. This time it's on databases. Dbs and
backends usually inspire yawns, because frankly, they're not
sexy--there's no pretty screens to look at. However, dbs are often a
bottle neck, and scaling beyond the usual db configs has been a source
of pain for large scale software. Here, I point out some relatively
obscure db stuff on the horizon--after some easy reading and news.
And oh, if you don't want to get these anymore, just lemme know.

A group is its own worst enemy
Nothing to do with dbs. Just a classic piece of text on social
software. Easy reading, but good lessons for me when building
http://www.mobtropolis.com
http://www.shirky.com/writings/group_enemy.html

Firefox 3 with XUL runtime
I did comment on this, and FF3 should be less prone to crashes, unlike
FF2, and the significance of this is much like Adobe's Integrated
Runtime(AIR), web devs will be able to create native desktop
applications using the usual web tools--HTML, javascript,
actionscript, XML, etc.
http://arstechnica.com/journals/linux.ars/2007/08/21/using-firefox-3-as-a-xul-runtime-environment
http://webjazz.blogspot.com/2007/08/using-firefox-3-as-xul-runtime.html

Adobe also open sourced their Photoshop engines. Offhand, I'm not
sure what one would do with it, unless there were some type of
innovative image manipulation--of which you'll see on the next link
http://opensource.adobe.com/group__asl__overview.html

Content-aware image resizing.
This is kinda neat. It uses energy functions to resize images while
keeping important content, and killing out background parts of the
images. If you don't click on any of the links but one, I'd click on
this one.
Update:
Well, what's interesting is that the basic energy function they used is simply a two-dimensional gradient. It's under the assumption that high frequency image content is usually what contains information/foreground/interesting parts of the image. This is probably usually true, and probably works for a large number of images. However, I think if you had an image of a flag with a forest as the background, it'll cut out the flag first.
http://www.youtube.com/watch?v=c-SSu3tJ3ns
http://www.faculty.idc.ac.il/arik/imret.pdf

Byte-serving is an aspect of the HTTP protocol that I didn't know
about. Apparently, you can request specific parts of a file over
http. Web-based bittorrent?
http://www.coneural.org/florian/papers/04_byteserving.php

hBase - Google bigtable open source clone. Bigtable is a in-house
developed distributed database. I watched a video lecture of it one
time, and it seems pretty neat.
http://glinden.blogspot.com/2007/07/hbase-google-bigtable-clone.html

A free database of the world's spec-related knowledge in one place.
Oddly enough, it is populated with things. I'm not sure what
motivates people to enter things in, but probably the same motivation
as people contributing to wikipedia. The neat thing about this is
that you can query it with an API.
http://www.freebase.com/signin/

CouchDb is an database that doesn't use relational tables. Mostly for
documents. It's still in alpha.
http://couchdb.org/CouchDB/CouchDBWeb.nsf/Home?OpenForm

Ambition is an experimental ruby gem that makes SQL queries as Ruby's
Enumerable functions. Web devs seem pretty allergic to SQL in general
and has tried to build layers between the dev to have one less
language to learn. Probably also the result of wanting a 3 tiered
architecture too.
http://errtheblog.com/post/10722

Mnesia is Erlang's distributed DB. I'm under the impression that it
doesn't use SQL. One queries directly by using Erlang tuples. I'll
have to learn more about this one.
http://www1.erlang.org/documentation/doc-5.0.1/lib/mnesia-3.9.2/doc/html/part_frame.html

Thursday, September 06, 2007

Is preloading child tables always a good idea?

Optimization isn't something you should do too early on, but I think a little house cleaning every so often to make sure your pages aren't ridiculously slow is healthy. With any optimization task, you'd want to benchmark the results and see if there's an actual gain. The very basic tool for benchmarking is the ordinary script/performance/benchmark. The easiest to find analysis tools is the rails_analyzer gem. The last time I used rails analyzer, it wasn't that easy to use. The command line arguments seemed arcane. But its bench tool, which can benchmark controllers as opposed to just object models, is fairly easy to use.

Using the bookmarking example from before, let's say you have something like:

class SceneController < ApplicationController
  def list
    @books = Book.find_books
  end
end

class Book < ActiveRecord::Base
  def self.find_books
    find(:all, :include => [:bookmarks], 
               :conditions => ["books.created_on > ?", 6.month.ago])
  end

  def bookmarked_by?(user)
    self.bookmarks.select { |bm| bm.owner_id == user.id }.empty? ? false : true
  end
end

In the listing of books, one would display whether it's actually bookmarked by a user or not. Normally, without the :include, the listing would make repeated queries to the DB every time it displayed a book list element, since it will use bookmarked_by?(user_id) to determine if a user bookmarked the book. So instead of just 1 query, it would make n + 1 queries.

Preloading child tables isn't necessarily wise all the time. It really depends on what you intend to do with the data after you fetch it. As the Agile rails book warns, preloading all that data will take time. If you look at your log files, you'll see that it's a significant amount.

If you're only going to load a limited number of these book list elements on a single page at a time, it actually might make sense to forgo preloading of child tables, and just use a find() instead of a select.

class SceneController < ApplicationController
  def list
    @books = Book.find_books
  end
end

class Book < ActiveRecord::Base
  def self.find_books
    find(:all, :conditions => ["created_on > ?", 6.month.ago], 
         :limit => 20, :order => "created_on desc")
  end

  def bookmarked_by?(user)
    Bookmark.find(:first, 
                  :conditions => ["book_id = ? and owner_id = ?", id, user.id]) ? true : false
  end
end

And if you're going to display counts of arrays, but all means, use counter caching. It's easy to do (as long as you follow instructions!), for most situations.

Intuitively, if you want to display over a certain n number of book list elements, it makes more sense to use :include and select it. However, I wanted to point out that when you make decisions like this, you'll always want to measure the load times, because you earn what you measure.

Also, use the right number of runs. Too short number of a number of times you run a function, the more variation you'll have in your benchmarks. Let's say that you get two numbers for two different methods.

$ bench -u http://localhost:3000/method1 -r 50 -c 5
50....45....40....35....30....25....20....15....10....5....
Total time: 240.383527755737
Average time: 4.80767055511475

$ bench -u http://localhost:3000/method2 -r 50 -c 5
50....45....40....35....30....25....20....15....10....5....
Total time: 156.147093772888
Average time: 3.12294187545776

So it's obvious that method2 is better right? Well, not necessarily. While benchmarks only show averages, you'll need to pay attention to standard deviations. The bigger the standard deviation, the more runs you'll need to figure out the average load time, and the number of decimal points you can trust. That way, you can figure out whether the difference in load times is statistically significant or not.

That way, you can ascertain whether the optimization you made were worth the trouble or not. tip!

Tuesday, September 04, 2007

Mobtropolis Public Release

I've been working on Mobtropolis for about 10-12 weeks now. It's was finally released last week Tuesday. It's something that makes people expand their world by helping them discover and share local adventures around them. The easiest way to think about it is as a dynamic large-scale photo scavenger hunt or a photo-dare site that helps you expand your world--hopefully for the better.

Behavior is hard to change, so it's framed slightly in terms of a game. The basic mechanics should be familiar to those that frequent social news sites. Anyone can submit scenes or vote them up. The higher something's voted, the higher its visibility to others. Anyone can do a scene and take a photo as proof. They can then send it in via their camera phone, or upload it from their digital camera when they get back to desktop. Their friends who voted for a scene will then get an email with the photo attached.

It's been oddly thrilling to get photos of people doing scenes that you submitted.

There's still a lot of work to be done on it. Eventually, I hope to marry the virtual and the real in a tighter loop and better integration with mobile devices. However I'm putting it out in according to startup mantras of "Release early, then iterate like crazy". So check it out, and if you'd be so kind, give me some feedback, good or bad, so I can make it better.

http://www.mobtropolis.com

Enjoy!

Sunday, September 02, 2007

Use barriers to your advantage

I use to read iwillteachyoutoberich, as he had some useful tidbits, and I, like many twenty-somethings, usually worry about personal finances...But anyway, the one piece that I really like that I gleamed from him was his take on barriers.

I think the source of 95%+ of barriers to success is…ourselves. It’s not our lack of resources (money, education, etc). It’s not our competition. It’s usually just what’s in our own heads. Barriers are more than just excuses–they’re the things that make us not get anything done. And not only do we allow them to exist around us, we encourage them. There are active barriers and passive barriers, but the result is still the same: We don’t achieve what we want to.

He had another post where he turned it around and said that you can make barriers work to your advantage as well, not just in avoiding kooks, but in increasing your productivity.

Since I do web dev, the browser's up all the time, and it's really easy just to hit ctrl-t www.facebook.com. And then before you know it, a whole half hour's been wasted. It's even worse with proggit or hacker news. A couple days ago, there was a tip on 4 lines to increase your productivity (can't find it now) on reddit, and it was just lines in a /etc/hosts file. It reminded me of barriers, so I decided to try it out.

I set up my /etc/hosts file:


127.0.0.1       www.facebook.com
127.0.0.1       news.ycombinator.com
127.0.0.1       news.octoparts.com
127.0.0.1       programming.reddit.com

# if I'm really having problems concentrating:
127.0.0.1       www.gmail.com
127.0.0.1       mail.yahoo.com

And lo and behold, it actually worked. Just the extra step of having to type in a command and a password is enough to deter me from not working. I do still hit ctrl-t once in a while, but then I'm reminded that it's fruitless, and I might as well get back to work. My friend Ian closes everything but a max windowed emacs as his productivity trick.

Anyone got any others they fool themselves with to get crackin'?

Saturday, September 01, 2007

Ajax.Ajax.PeriodicalUpdater has a decay option

Prototype JavaScript framework: Ajax.Ajax.PeriodicalUpdater

There's plenty of treasure in API docs, I've usually found--like when you need two submit buttons for an AJAX form. While tutorials are helpful for just getting started, I'm a firm believer in just browsing through API docs and references once in a while, like a lazy grounds keeper checking for garden gnomes. I also like reading dictionaries. I don't do that too often, just when I'm looking up words. I never got any papers done until internet dictionaries came around.

The past two days, I've been playing more with Javascript, and that involved looking more closely at the Prototype library that comes with Rails. So far, my experience with prototype has been pretty good. It's less high level than, say mookit, but I think it was meant to fill holes in the current javascript language. Even little things like Try.these() are nice, due to javascript discrepancies between browsers.

As a result of browsing through the Prototype API, I found that the adaptive polling I had talked about before was actually already in the Prototype library. It was just never mentioned in any of the Rails docs or tutorials about periodically_call_remote().

Though I don't know if it was around when I blogged it last December, that should be lesson to me to stop talking, and just try writing a patch, as Prototype is open source. I probably would have learned a lot.

Web Jazz