Thursday, December 11, 2008

The Web and all that Jazz: Git remote branch notes

A while back, I was wondering how to make a remote branch from a local branch because all the solutions I knew of created a remote branch from master, using origin

After trying out a couple things, I realized I could just create the remote branch from master, and just rebase the damn thing:

git push origin {new_remote_branch}
git branch --track {new_remote_branch} origin/{new_remote_branch}
git checkout {new_remote_branch}
git rebase {local_branch}
git push


Not super interesting, but enough and short enough to blog about. tip!

Friday, November 21, 2008

Augmenting Google's search results

Here, techcrunch was complaining about Google's newly released feature of being able to change the ordering of the search results. The point that seemed to have escaped people is that, the search results that you change are only local to you. Other people don't see them.

Because of that, I think Google can encourage the behavior where people are sending links relevant to themselves up to the their front page in an honest way. If they click things up that have no relevance to themselves, they only screw up their own searches.

So if you have millions of people sending things relevant to themselves up the list, Google is basically gathering data about what the human 'gold standard' for a particular search query is (augmented by machine sorting first).

In information retrieval, one of the hard things is to know whether you've done a better job of returning search results or not, since a gold standard dataset is hard to come by and time consuming to produce.

Here, I think Google is simply using a mechanism where they're aligning people's self-interest to create accurate data they can use to compare their search algorithm tweaks.

All in all, a good thing. Even better if you can 'merge' search results from others that you trust..say your friends. I wouldn't mind merging my search results with _why's. More complexity that way, though.

Friday, October 24, 2008

Frogmetrics on techcrunch

As some of you know, I joined a YCombinator company called Frogmetrics. We've launched, and was covered by Techcrunch last night.

I work on the server side of things, which means the analytics package which we call Mastermind. Working on the architecture over the summer was a little bit different than the typical rails app.

Mastermind allows users to view graphs of survey data for a particular region, location, or employee at a specific time period.

One of the comments that I saw on techcrunch was why didn't we use iPhones? That's a common question, and the answer is a mix of technical and business reasons. If we had iPhones to give to people, they would readily know that it was an iPhone and be more likely to run off with one. Sorry Nokia, but your tablets just aren't as sexy to run off with. Alternatively, telling people to go to a webpage on their iPhones would be another barrier to taking the survey. It wouldn't get the high response rates that we're seeing right now. The idea is to make it as easy and as painless as possible to take a survey. In our pilots, we found that older folks had a much tougher time reading the text. Nokias have bigger screens, and we could hence put bigger text on there, not to mention big beautiful buttons that affords pushing.

Well, if you or your company has interest in our product, feel free to let us know on the contact page. I dislike that we don't have our prices on the web page as of yet, but that'll be worked out soon.

Wish us luck (and hard work)!

As a side note, interesting things that I post on here don't always have to do with Frogmetrics. Like any good engineer, I play or think about interesting things whether they're related to work or not. So don't take it that whatever I'm playing with here is what's going into Frogmetrics. :)

Edited

Thursday, October 02, 2008

Benchmarking row multiply in Ruby's Linalg

While using LinAlg to write and understand/review information gain, I found that I wanted the equivalent of a row multiply, notated as [*] (it's non-standard. I made up the notation), where:
A [*] x = B

[[a00, ..., a0M] [[x0] [[a00 * x0, ..., a0M * x0]
[... ...] [*] [x1] = [... * x1, .... ...]
[aN0, ..., aNM]] [x2]] [aN0 * x2, ..., aNM * x2]]

I want to map each row with each corresponding value of x. This operation is not standard matrix multiplication, though I feel like it should be!

I admit I slept in Linear Algebra class. Thus I was a bit dumbfounded about how to express it using normal matrix multiplication. While I could do it using LinAlg's mapping functions, I knew that it could be done because it was all linear transformations.

Luckily James Lawrence, who maintains LinAlg was emailing me, and asked me about it (thanks!). He gave me some code that did it both ways. After I read it, I slapped myself.

# using straight up matrix multiplication
def calculate1(con)
DMatrix.diagonal(con*DMatrix.new(con.hsize,1,1)).inverse*con
end

# using map
def calculate2(con)
DMatrix.diagonal((con*DMatrix.new(con.hsize,1,1)).map { |e| 1.0/e })*con
end

I wasn't sure which one was faster. Maybe they'd be about the same. Maybe doing matrix multiply would be slower because you'd have to go through all the multiplications. Per James's suggestion, I benchmarked them, in the fine form of yak shaving.

Run on a laptop 1.1GHz Pentium M, 758MBs, I did two experiments. For one experiment, I kept the matrix at 800 rows, then I grew the columns. Let's see what it looks like:

time (secs) vs column size

I should have labeled the axis in GnuPlot, but you'll live. The y axis is the number of seconds, and the x axis is the column size. Uninteresting, but nice to know it's linear. The two functions aren't significantly different from each other in the amount of time that it takes. I expected the map (calculate2) to be much faster since it didn't have to do all the multiplies. oh well.

I almost didn't run the second test, but it proved to be a bit more interesting. This time I kept 800 columns and grew the number of rows. Same axis. Different graph:

time (secs) vs row size

Whoa! It's exponential. Or quadratic. I can't tell. Anyway, anything curving up is bad news. I suspect this might have something to do with row major/column major ordering. C stores matrices row by row, whereas Fortran stores it column by column.. Update: As corrected by James L., in the first experiment, growing columns will create more multiplies inside the diag() call, but the size of the diagonal will stay the same. Growing the rows, however, will create less multiplies inside the diag() call, but each increase in row size will increase both dimensions of the resulting diagonal matrix, giving us n^2. So it's quadratic.

So what I went looking for wasn't what I meant to find. But given that I've read about it before, it wasn't super interestingit would have made sense had I thought about it, I guess it's not super surprising. Let's see if we can use transpose to cut down on the time. For one part, we'll grow the rows as before, and compare it to growing rows, but transposing the input then transposing the output, to get the same deal. What's it look like:

time(secs) vs row size

This is good news. Even if transpose is an extra couple of manipulations, it saves us computation for bigger matrix sizes. The most interesting part of the graph is the crossing of the two graphs. If somehow, LinAlg (or any other package for that matter) can detect where that inflection point is going to be, it can switch off between the two. The only thing I can think of is another package lying underneath doing sampling of each implementation randomly whenever a user calls the function to do interpolation of its growth curve, and then calculate the crossing analytically. I don't currently know of any package that does this (or if it does, I don't know about it, cuz it already performs so well by doing the switch!)

This was a nice little diversion from my side projects...a side project of side projects. Back to learning about information gain and its ilk. At least something came out of it. I have a nice little experiment module that I can use to do other experiments. And I spent way too much time on it not to post something...

I share my code below, and you can run it if you want. snippet!

# These are the experiments that I ran
#!/usr/bin/ruby

require 'experiment'

require 'linalg'
include Linalg

def calculate1(con)
DMatrix.diagonal(con*DMatrix.new(con.hsize,1,1)).inverse*con
end

def calculate2(con)
DMatrix.diagonal((con*DMatrix.new(con.hsize,1,1)).map { |e| 1.0/e })*con
end

DATA_PTS = 10
MAX = 800
MID = MAX / 2
STEP = MAX / DATA_PTS

# Growing columns
Experiment::compare(1..MAX, STEP,
proc { |col_size| DMatrix.rand(MID, col_size) },
proc { |matrix, col_size|
calculate1(matrix)
},
proc { |matrix, col_size|
calculate2(matrix)
})

# Growing rows
Experiment::compare(1..MAX, STEP,
proc { |row_size| DMatrix.rand(row_size, MID) },
proc { |matrix, row_size|
calculate1(matrix)
},
proc { |matrix, row_size|
calculate2(matrix)
})

# Growing rows, transposing
Experiment::compare(1..MAX, STEP,
proc { |row_size| DMatrix.rand(row_size, MID) },
proc { |matrix, col_size|
calculate1(matrix)
},
proc { |matrix, col_size|
calculate1(matrix.transpose).transpose
})


# Experiment.rb - perform timed experiments across on one dimension
require 'rubygems'
require 'gnuplot'

# Experiment is a module where you can use to plot and benchmark pieces of code
# and plot it on a gnuplot
#
#
module Experiment
class << self

def timer
start_time = Time.now
yield
diff_time = Time.now - start_time
puts "That run took #{diff_time} seconds"
return diff_time
end

# takes range to try and the step resolution of the range and runs
# the benchmark_block with each of the different ranges.
# the init_proc only gets run once during setup.
#
# Experiment::benchmark((1..50), 10,
# proc { |col_size| DMatrix.rand(500, col_size)},
# proc { |matrix, col_size| ProbSpace.new(matrix).entropy })
def benchmark(range, resolution, init_proc, benchmark_block)
xs, ts = [], []
range.step(resolution) do |x_size|
object = init_proc.call(x_size)
xs << x_size
ts << timer { benchmark_block.call(object, x_size) }
end
plot(xs, ts)
end

# same idea as benchmark, but does two at the same time. So
# it takes an init_proc to initialize the experiment,
# and two different procs, standard and comparison, to run
# for each step in the range at the given resolution.
#
# Experiment::benchmark((1..50), 10,
# proc { |col_size| DMatrix.rand(500, col_size)},
# proc { |matrix, col_size| ProbSpace.new(matrix).entropy1 },
# proc { |matrix, col_size| ProbSpace.new(matrix).entropy2 })
def compare(range, resolution, init_proc,
standard_block, comparison_block)
xs, s_ts, c_ts = [], [], []
range.step(resolution) do |x_size|
object = init_proc.call(x_size)
xs << x_size
s_ts << timer { standard_block.call(object, x_size) }
c_ts << timer { comparison_block.call(object, x_size) }
puts "#{x_size} = standard : comparison :: #{s_ts.last} : #{c_ts.last} secs"
end
plot(xs, s_ts, c_ts)
end

def plot(x, *ys)
Gnuplot.open do |gp|
Gnuplot::Plot.new(gp) do |plot|
ys.each do |y|
plot.data << Gnuplot::DataSet.new([x, y]) do |ds|
ds.with = "linespoints"
ds.notitle
end
end
end
end
end

end
end

Monday, September 29, 2008

Somebody knows Shoes

I'm a fairly late adopter compared to some of you. When Shoes came out, I had followed it, but never tried it. It's been at least 8 months now, and I finally got around to it. I guess I don't really try something out unless I have a use for it in my mind. Still waiting for that Haskell project...

Anyway, trying out Shoes was...easy. Outside of compiling it, actually. I take that back. Once I read the README and figured out what packages I needed, it was easy. Even easier when I found the executables. Running the sample code was...easy. And trying out your own app...you guessed it...even easier. It helps that there's a ton of examples and documentation. Makes me think that when programming, you should keep the API of your code in mind, so that it's easy to use after you've implemented it.

Two things strike me immediately. _why took his ability to make DSL in Ruby to heart. I don't have to think about the buffering, cursor positions, etc. Even better, the functions are orthogonal and consistent.

You can not only move shapes around to follow your mouse, but also widgets, like buttons.
Shoes.app(:width => 300, :height => 300 do
background snow
@o = stack do
button("Message") { alert("Sent!") }
end

motion do |x,y|
@o.move width - x, height - y
end
end

Motion gets executed whenever your mouse moves around on the Shoes screen. So you can have a button dancing opposite to your mouse. Another surprising thing is that an app is structured like a webpage, and different pages have urls. And routing is declared in the app block itself! And not only that, it takes regexs
class Dictionary < Shoes
url '/', :index
url '/(\w+)', :word

def index
stack do
title "Enter a Word"
@word = edit_line
button "OK" do
visit "/#{@word.text}"
end
end
end

def word(string)
stack do
para "No definition found for #{string}. ",
"Sorry this doesn't actually work."
end
end
end

Shoes.app

I'm not sure how far this sort of thing can go when building more complex GUIs, but so far, it seems pretty promising. There were a couple things that were broken on Ubuntu, however. I couldn't get margins to work, as well as rounded corners:
Shoes.app(:margin => 20) do
background red, :radius => 12
end

Boo. Rounded corners would have been exciting.

Well, try it out and see if you like it. There's lots of neat things people have built with Shoes already.

I, on the other hand, just built a prototype. Nothin to see here folks.

Wednesday, September 24, 2008

EmacsWiki: Moving The Ctrl Key

EmacsWiki: Moving The Ctrl Key

I put off doing this for too long now. I have to admit, if there are settings that I can do in a GUI without mucking around in a config file, I'd do it. Ubuntu has made it easy.

Monday, September 22, 2008

Installing Ruby's linalg in Ubuntu

Ruby has a project for download called linalg that exposes the LAPACK functions in Ruby. LAPACK is a set of routines written in Fortran to do Linear Algebra operations. Now, before you cry foul about Fortran, LAPACK is used in a lot of places, and it's pretty damn old and has held up over the years.

The linalg package just updated about two weeks ago, first since 2004, so it has some updates. Unfortunately, there's no easy gem to install. You'll have to download the tar, and then run

sudo ruby install.rb

Of course, you'll run into some problems, unless you have other packages installed:

sudo apt-get install lapack3 lapack3-dev libg2c0-dev

Don't worry, if you don't like it, you can run uninstall. Read INSTALL file for more directions.
I use to wonder how people knew which packages to look for. Apparently, you look at the output and see what files the configure is crapping out on. Then use apt-file to search for which package the file is in. It's one of those basic things that seems not mentioning in hindsight. Like calling "source ~/.bashrc" after you've edited your bash files. No one ever mentions that.

sudo apt-get install apt-file
sudo apt-file update
apt-file list g2c.h

Knowing that, you'll know how to fish, and I'll not post how to install stuff in the future.

tip!

Friday, September 12, 2008

Algorithm always matters, even when parallel

I've finally had some time to start looking at things that looked interesting to me. One of the things that I started looking at was parallel algorithms.

I was reading Hacker News, and I saw some link that lead me to discover Cilk, a C extension to easily make parallel programs.
thread int fib(int n) {
if (n < 2)
return n;
else {
cont int x, y;
x = spawn fib (n-2);
y = spawn fib (n-1);
sync;
return x+y;
}
}

This was the example given for calculating the Fibonacci sequence in parallel. This is the standard mathematical way to define it, and it looks clean enough. So instead of trying it out in Cilk, I fired up Erlang to try my hand at doing a port. I found it a little bit difficult because while you can easily spawn processes in Erlang, there was no quick way for the parent process to wait/sync/join child processes and get their results. Since that was besides the point of the exercise, I fired up Ruby, even though they had a slow Threading library (which is suppose to be fixed in 1.9, plus Fibers!) I'll do with it Erlang some other time.

First I wrote the threaded version to mimic the Cilk code:
def fib_threaded(n)
if n < 2
return 1
else
threads = []
x = 0
y = 0
threads << Thread.new(n - 2) { |i| x = fib_threaded(i) }
threads << Thread.new(n - 1) { |i| y = fib_threaded(i) }
threads.each { |thread| thread.join }
return x + y
end
end

I don't have a multi-core processor. I'm on a 3 year old 1GHz laptop. At a mere fib(18), it was taking about 21 seconds to run. To see if there was a difference, I wrote a serial version.

def fib_serial(n)
n < 2 ? 1 : fib_serial(n - 1) + fib_serial(n - 2)
end

This one ran much much faster. It took about 0.02594 seconds. At this point, it's probably the overhead of thread creation that's making it take so long to run. Maybe with green threads or lightweight threads, the threaded version would run much faster. That makes me want to try it in Erlang just to compare. But wtf, adding shouldn't take that long, even if it is 0.025 seconds

When I thought about it, it was an efficient algorithm: there's a lot of wasted cycles. In order to compute f(n), you have to calculate f(n - 1) and f(n - 2) in separate threads.
  • The f(n - 1) thread requires it to spawn two more threads to compute f(n - 2) and f(n - 3).
  • The f(n - 2) thread requires it to spawn two more threads to compute f(n - 3) and f(n - 4).
Notice that both the threads for f(n - 1) and f(n - 2) have to spawn two different threads to calculate f(n - 3). And since this algorithm has no way for threads to share their results, they have to recompute values all over again. The higher the n, the worse the problem is, exponentially.

To calculate the speedup given to an algorithm by adding more processors, you calculate the amount of total work required and divide it by the span of the parallel graph. If that didn't make sense, read lecture one for Cilk, which is where the diagram comes from. So for fib(n)

Twork = O(n^2)


The total amount of work is the total number of processes spawned. Since every f(n) recursively spawns two other processes, it's about n^2 processes.

Tspan = O(ln n)

The total span is how many nodes a particular calculation traverses. A la the diagram, it's about the height of the tree, so that's about ln n nodes.

Therefore, for fib(n), the processor speed up is at most:

Tw / Ts = O(n^2) / O(ln n)

I don't know of any reductions for that off the top of my head, but you can see that the processor speedup gain grows somewhere in between n and n^2. On one hand, it means this algorithm can benefit from speedups by adding up to somewhere between n and n^2 processors. However, that also means that to make this algorithm go as fast as it can to process fib(1000), you need more than 1000 processors to make it worthwhile. Not so good for a parallel program that's just doing addition.

As a last version, I wrote one that computed the Fibonacci from 0 up to n, and keeping the total as I go along, instead of the recursive version that has to work its way n back down to zero.

def fib_loop(n)
sequence = [1, 1]
if n < 2
sequence[n]
else
sequence = (1..n-1).inject(sequence) do |t, e|
t << (t[-2] + t[-1])
end
end
sequence[-1]
end

It's not space effective since I wrote it quickly, but this beat the pants off the other two running at 0.00014 seconds. As you can see, you're not recalculating any f(n) more times than you need to.

I wish Cilk had a better first example to parallel programs. Given that the guy making Cilk is the same guy that co-wrote the famous mobile book for algorithms, I guess I was surprised. However, it was a fun exercise, and made me think about algorithms again.

I'll find some other little project that requires me to write in Erlang, rather than falling back on the comfortable Ruby. snippet! Below if you want to run it yourself.

Wednesday, September 10, 2008

Anonymous scope, the unknown cousin of Named scope

Last time, I showed you the well known named scopes. This time, I'll talk about the little documented anonymous scopes.

Anonymous scopes were mentioned briefly on Ryan's Scraps. And in the API, I found it tucked away in ActiveRecord::NamedScope module documentation.
All subclasses of ActiveRecord::Base have two named_scopes:
  • all, which is similar to a find(:all) query, and
  • scoped, which allows for the creation of anonymous scopes, on the fly:
    Shirt.scoped(:conditions => {:color => ‘red’}).scoped(:include => :washing_instructions)
These anonymous scopes tend to be useful when procedurally generating complex queries, where passing intermediate values (scopes) around as first-class objects is convenient.
How would this be useful? In the example given, it's really not. And most of the time, what you need to do will suffice with named scope. However, there are times when named scope doesn't give you the flexibility that you need, and it is actually quite powerful when it's used in conjunction with association proxies.

I was using the better nested set plugin. It allows you to have a fast access tree structure in a relational database. And while it's a neat plugin, I couldn't chain my calls like such:
@father.subtree.with_email  # => fails

to find all the father's descendants that had an email. That's because subtree() exists in the plugin and it uses find(), and that returns an array of objects. You can't further extend the call, because by find returns an array, not an association proxy.

In our association proxies, if we expect to chain the calls, we can use scoped() instead of find(). Just to demonstrate:
class Person < ActiveRecord::Base
has_many :people do
def subtree
scoped(:conditions => ["lft between self.id and self.id", self.lft, self.rgt])
end
end

named_scope :with_emails, :conditions => ["email is not null"]
end

That means we would be able to change other scoped after subtree():
@father.subtree.with_emails # => returns all children

There's not much to it, but it's nice when, once again, you're into breaking Law of Demeter.

tip!

Monday, September 08, 2008

not vs !, and vs &&

It's always confusing when you have similar methods that have slightly different meaning. What's the difference between not vs !, and and &&?

The difference between is subtle enough that it might not matter most of the time, but it might bite you in the ass one day. It doesn't say which is which, this post gives the example of precedence order. But just by experimenting, you can see that it's and, or, and not that form statements, which can't be used in arguments

tip!

Friday, September 05, 2008

ChicagoRuby.org alumni list

Last year August, I joined the local Ruby meetup in Chicago, mainly to meet other programmers, and see if they know stuff about Ruby I hadn't come across yet. Since I lived in the suburbs, going to Chirb was a little bit of a hike, so good thing for ChicagoRuby.

At the first meetup, there were seven people, and one of them being Ray Hightower. The previous organizer was stepping down, Ray volunteered to step up, and I said I'd help him out. I mainly helped out with small things and giving talks when there was none to be had. Since then, Ray's grown the meetup to over 150 people, not to mention setting up a whole conference on Rails. WindycityRails, however is sold out, so you'll have to wait until next year. Kudos to Ray.

I hadn't been involved with ChicagoRuby since about March or so, since I've run off to Boston and then New York. So Ray rightfully set me up on the Alumni list. Yay.

Anyway, if any of you are based in Chicago, go to ChicagoRuby, and help Ray out. Volunteer for stuff. He'll appreciate it, and your ruby community will appreciate it too. The benefits of doing work and getting involved far outweigh the costs of doing so.

Set your hostname when installing yaws

I had recently installed yaws to be able to play with Erlyweb, but I haven't had time to. There's been a string of open source projects that I wanted to dive into to read other peoples' code. Shoes, jquery, couchdb, ejabberd, bittorrent, ubiquity commands. Those are just the ones I think I can digest. Google's V8 engine looks interesting to peruse through, but it's more time than I have anyway.

Regardless, I was getting this on all my package updates on Ubuntu.
$ sudo apt-get install yaws
sudo: unable to resolve host toughie
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
yaws-chat yaws-mail yaws-wiki yaws-yapp
The following NEW packages will be installed:
yaws
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 0B/637kB of archives.
After this operation, 2138kB of additional disk space will be used.
Selecting previously deselected package yaws.
(Reading database ... 169590 files and directories currently installed.)
Unpacking yaws (from .../archives/yaws_1.73-2_i386.deb) ...
Setting up yaws (1.73-2) ...
hostname: Unknown host
chown: cannot access `/etc/yaws/*.pem': No such file or directory
dpkg: error processing yaws (--configure):
subprocess post-installation script returned error exit status 1
Errors were encountered while processing:
yaws
E: Sub-process /usr/bin/dpkg returned an error code (1)


To be honest, I had no idea what *.pem files are. Apparently, they're for SSL, and the Ubuntu install can't figure out how to start the yaws server without your machine having the hostname set up correctly.

Monday, August 11, 2008

Radio slience

I've been missing in action because I joined a ycombinator startup (as an employee) this summer. Ever since May, I've been going at it, and lots of things have been hectic. I've only gotten back into the swing of things that I usually do recently. I've learned a lot, mostly things outside of coding.

Mobtropolis will still be up and running, as it mostly takes care of itself. I'm surprised that I still have readers subscribed on the news feed. Hopefully, posting when I have something to say rather than just making noise helps in that regard.

I have a bunch of backlogged post drafts I've gotta get to.

Wednesday, August 06, 2008

Named scope, how do I love thee

I'm not sure how I missed it, but named_scope is something that I've been looking for. I should really read more of Ryan's scraps. Just in case you don't know, named_scope is a way to add filters and conditions to the finder methods on your model.

There's a couple other hipper rails programmers that have covered it months ago, so I'll defer to original author and the aforementioned Ryan and his table scraps to tell you about the basic things you need to know. This functionality has been absorbed into Rails 2.1 and you can find it under the method name, named_scope.

In this post, I'll talk about some of the uses I've found for it. There's more code posting in this one than usual, but it's incremental, so all you have to do is notice what's different between the sets of code examples.

Lately, I've found that I needed to mix and match different kinds of conditions in my finder methods in my models. Let's say we have articles each that have many comments. How do we find comments that have an email address? How about if we wanted articles with a url address included in the comment post? We could make another has_many association.


class Article < ActiveRecord::Base
has_many :comments, :order => "comments.created_at desc"
has_many :comments_with_email,
:conditions => "email is not null",
:order => "comments.created_at desc"
has_many :comments_with_url,
:conditions => "url is not null",
:order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article
end

Or instead of cluttering things up in the class namespace, we can use an association proxy extension so that instead of calling @article.comments_with_email, we can call @article.comments.with_email (and violate Law of Demeter)

class Article < ActiveRecord::Base
has_many :comments, :order => "comments.created_at desc" do
def with_email
# we can do it this way
with_scope(:find => { :conditions => "email is not null",
:order => "comments.created_at desc" }) do
find(:all)
end
end

def with_url
# or we can do it this way
find(:all, :conditions => "url is not null",
:order => "comments.created_at desc")
end
end
end

class Comment < ActiveRecord::Base
belongs_to :article
end

This is all fine and well, until you need to find all comments with emails and url. You can make finders that take arguments, but entertain the following possibility. find() in the association proxy extensions actually return an Array, so you cannot chain them, like @article.comments.with_email.with_url

How do we do this? named_scope() is one way to do it.

class Article < ActiveRecord::Base
has_many :comments. :order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article

named_scope :with_email, :conditions => "email is not null"
named_scope :with_url, :conditions => "url is not null"
end

That means you can do things like

@article.comments.with_email

Or you can actually call count(), so that the sql is calling a count instead of instanciating all the active record objects in an array then calling size, which is much faster:

@article.comments.with_email.count

Not only that, but if there are other models that associate with comments, you have the scoping filters in one place in the code.

class User < ActiveRecord::Base
has_many :comments, :order => "comments.created_at desc"
end

class Article < ActiveRecord::Base
has_many :comments. :order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article
belongs_to :user

named_scope :with_email, :conditions => "email is not null"
named_scope :with_url, :conditions => "url is not null"
end

So not only can you find all comments with both email and url for an article, you can do the same for users:

@article.comments.with_email.with_url # all comments with email and url of an article
@user.comments.with_email.with_url # all comments with email and url by a user

Therefore, if you have common intersecting conditions that you need to do, like all the comments in a period of time for an article, named scope will help. For, I'd like to be able to call:

class User < ActiveRecord::Base
has_many :comments, :order => "comments.created_at desc"
end

class Article < ActiveRecord::Base
has_many :comments. :order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article
belongs_to :user

named_scope :with_email, :conditions => "email is not null"
named_scope :with_url, :conditions => "url is not null"
named_scope :in_period, lambda { |start_date, end_date|
{ :conditions => ["respondents.created_at >= ? and " +
"respondents.created_at <= ?",
start_date, end_date] }
}
end

So now we can call:

@article.comments.in_period(@start_date, @end_date)
@article.comments.with_email.in_period(@start_date, @end_date)


Cool you say! Now before you go back into your code and start replacing all of your stuff with named_scopes, keep in mind that there are edge cases where named_scopes wouldn't be appropriate. I fell into the trap of thinking that I could used named_scope for everything like a kid that found a new hammer, the world looked like a nail. So I spend more time than I should trying to bend named_scope to my will.

One of the things that fails is that there is no way (as far as I know) to override named scope conditions, like with_scope, outside of going into rails and messing with it and submitting a patch.

For example, if we already have an association of comments with the article that sorts in descending order, we cannot have named scopes that ask for the earliest and latest article using named_scope.

class Article < ActiveRecord::Base
has_many :comments. :order => "comments.created_at desc"
end

class Comment < ActiveRecord::Base
belongs_to :article

named_scope :earliest, :order => "comments.created_at asc",
:limit => 1
named_scope :latest, :order => "comments.created_at desc",
:limit => 1
end

This won't work because named_scope assumes that you'd want to merge all the conditions throughout the entire chain.

@article.comments.latest # will work because the sql will look like:
# SELECT * FROM `comments`
# ......blah blah....
# ORDER BY respondents.created_at desc,
# respondents.created_at desc
# LIMIT 1

@article.comments.earliest # will not work because the
# SELECT * FROM `comments`
# ......blah blah....
# ORDER BY respondents.created_at desc,
# respondents.created_at asc
# LIMIT 1

Next time, I'll cover named_scopes cousin that's not very documented, so it's easy to skip over: anonymous scopes.

Tip!

Sunday, July 20, 2008

Git remote branch notes

I'm surprised that I still have readers. All apologies, but things have changed significantly in the past month. I joined another startup, and have been busy with that. Mobtropolis is still up and running, however, as it pretty much runs itself. So no worries about that.

I've not learned too much, other than how to use git and some things about couchDB on the side, but it should be interesting. I'll post more about it after we launch, which should be soon.

Just to throw down some notes that I usually am looking for about git:

To create the remote branch:
git push origin origin:refs/heads/{branch}
or
git push origin {local_branch}

To delete a remote branch:
git push origin :heads/{branch}
git push origin :somebranch

I'm not sure how to do this, but it seems like to create new remote branch from local branch:
git push origin {local branch name}"

Anyone know for sure?

Wednesday, May 21, 2008

Segmentation of social news sites

Giles Bowkett: Summon Monsters? Open The Door? Heal? Or Die?

I have to admit, I almost stopped reading after the first couple paragraphs justifying himself being jerk-ish, but he does have a healthy dose of good points towards the middle.

The underlying assumption of 'wisdom of the crowds' is that people make independent decisions, and they have the same amount of time to do it. Neither are true in social news sites. The former being untrue because you can up vote stories that are already on the front page. That just makes it into a positive feedback system that blows up and amplifies small signals. It's ok when the community is small, but as it gets larger, the more likely that noise will make it.

The second point I didn't think about until Giles pointed out explicitly--that since votes come for free, that people that spend their time on the sites are the ones that influence it the most.

Combine the two effects, you have a recipe for amplification of noise. The problem is, you need the amplification mechanisms like up voting on the front page in place when the site is small to grow it, and then when it reaches a certain size, the mechanics of social sites need to change (to what, none of us exactly figured it out yet) to protect the users from themselves.

I'm venturing to guess personalization and fuzzy segmentation to be one solution. As Paul Buchheit mentioned earlier about how twitterers hardly get any spam, it's because if anyone's saying stuff you don't want to hear, you can just unfollow them. Twitter works in this regard because there is a built-in small world network with a relatively low transmission rate between nodes (as opposed to facebook which has a small world network, but high transmission rates of information between nodes...which results in lots of unwanted invitations to bite zombies and vampires). Social sites like Digg, reddit, and hacker news, don't really have a network. It's just one single "place", where what happens on it affects everyone, and small perturbations get amplified.

However, I don't think such a strategy would work well in the beginning. The very thing that helps a small community in the beginning hurts a larger community, and the very thing that would protect a larger community from itself would stunt the growth of a new smaller one.

I think this would be an interesting topic and ripe for research. It actually reminds me of ant colonies, where younger ant colonies will act like teenagers, taking more risks, focus on growing, and experimenting. Older ant colonies are more about taking less risks, maintaining the brood, and surviving. There's some sort of decentralized mechanism that kicks in for ant colonies to do that, or maybe once they reach a certain size. I think looking into the literature for that might yield some clues into how to design community sites so that they can grow in the beginning, and not implode when they get bigger.

Friday, May 02, 2008

Make it easy for users to let others know how awesome they are

I recently got an email from Amy of Blogged.com, as many of you probably have about how she rated your blog. The editorial list seems pretty spot on, as she did her homework. However, I think it would have been more useful if the listing was filterable by the different criteria that she used to rank the blogs, such as quality of updates, frequency of posts, etc.

This blog got an 8.1, as I don't post all that often. I try to post only when I have something to say. But in reality, while 8.1 might sound hot, it puts me way back at beyond page 10 of her list, which I'm sure no one really looks at.

But this got me thinking about how word of mouth might work. I suppose one way is to tell people how awesome they are, and encourage them to tell other people how awesome you think they are. In essence, I guess that's what great products do, right? They let you get stuff done, quickly, easily, and with a bit of fun, so that you feel like you're awesome. If you're awesome, you'd like to tell other people how awesome you are.

This this limited scope, I think something similar for mobtropolis would make a lot of sense. One way for people to feel awesome using mobtropolis is if they've had a sense of accomplishment by completing something. Or, they get a validation of that accomplishment simply by friends commenting on it. I need to make it easier for this to happen, and I suspect the more it does, the more people will feel good about themselves.

What about your product or app? For any particular application, and especially if it's a tool, if you can make a user feel awesome, make it easy for them to let others know how awesome they are. It can be stats on their accomplishment, or a limited feature only they can access. Either way, it has to be limited and unique to them, and yet publicly accessible to others.

Wednesday, April 16, 2008

Going to Startup school

I'll be going to startup school this weekend in San Francisco. If you're going to be there, drop me a line at wil @ 3cglabs dt com, and we'll say hello. It'd be interesting to meet the readers of the blog.

If you haven't heard of start up school, it's hosted by ycombinator. The only reason I'm going is because I applied and got in, rather than paying a couple thousand smackeroos for it. Thank you ycombinator. I've been unable to go to conferences due to the outrageously high cost of conferences. But it might be ok, because when I think about it, really fresh and new ideas are usually not at big conferences, but at small little ones that no one knows about yet. Startup school defn is not unknown, but at least it's comparatively small.

I'm not sure exactly what to expect, but I know that I'll have to spend the day explaining mobtropolis over and over again. Time to beef up and whittle down that 3 min explanation. This is something you should do anyway, so work on that!

In the meantime, I'm looking forward to hearing the speakers talk, and hope I get a lot out of it.

Monday, March 31, 2008

Gotchas of internal iFrame facebook apps and external web apps using Facebooker gem

A while back, I added mobtropolis to facebook as an internal app. I decided to go with using FBML because there was more support in the how-tos about how to use it, and it looked like tighter look and feel and integration.

However, unlike many facebook apps, Mobtropolis also exists as a stand-alone external web app. This decidedly made things a little bit hairier, and I had to write a custom mime-response filter to be able to tell whether a call was coming from a web client (HTML), or as an internal facebook app (FBML), in order to authenticate correctly. I also ended up having to write some custom testing methods for it as well.

Then I revamped the layout of mobtropolis.

It's major suckage to have to maintain two separate views, so I decided to go with an iFrame with the internal facebook app. It took a bit of work to convert it to use iFrames, because authentication gets a little bit more complicated. However, it's something that I only have to deal with once. Subsequent changes to the layout won't affect it as much.

In retrospect, I should have went with using an iFrame from the beginning, though, at the time, mobtropolis was fairly ugly. This is what people call "judgement", and I made the mistake and it cost me about three weeks. The thing is, you just make the best decision you can at the time, and make sure you can change directions easily.

There were a couple gotchas when using iFrames.
  1. Double facebook frames on redirect to install page.
  2. External app's layout is wider than iFrame
  3. Facebook only sends fb params on the first call to your app

Hopefully, I'll save you some time, to whomever's looking for this info.

1) Double facebook frames

When you use ensure_application_is_installed_by_facebook_user or ensure_authenticated_to_facebook, it will automatically reroute the user to an install page if he didn't install your application. Problem is, it assumes that you're not in an iFrame. It ends up that you can override application_is_not_installed_by_facebook_user in your controllers.

def application_is_not_installed_by_facebook_user
redirect_to add_internal_facebook_app_url
end

Where add_internal_facebook_app_url is an action in a controller (say, my_controller), that renders javascript to change the location of the top frame.

def add_internal_facebook_app
render :layout => false, :inline => %Q{<script type="text/javascript">
top.location.href = "<%= session[:facebook_session].install_url -%>"
</script>}
end

You have to make sure you connect it as a route in order to redirect it like I did in the overridden application_is_not_installed_by_facebook_user(), in routes.rb under config/

map.add_internal_facebook_app('add_facebook_internal_app',
:controller => "my_controller",
:action => "add_internal_facebook_app")

2) External app is wider than iFrame

I think there is a way to resize the Facebook iFrame, but I didn't find out about it after I did this. By default, the Facebook iFrame "smartsizes" itself, to fill out rest of the page.

First, I created a stylesheet called fb_internal_layout.css, that had extra stylings that squeezed the interface in a 446px wide iFrame. Then I included it in the headers of my layouts as:
<link href="fb_internal_layout.css" id="fb_internal_layout" media="screen" rel="alternate stylesheet" title="Facebook Internal Layout" type="text/css" />

Make sure you include titles in the link, so that you can actually switch it out.

Then we use javascript to turn on or off this alternate stylesheet depending on whether we're in an iframe or not. You can use something like what's described in A List Apart's article on alternate stylesheets to switch out stylesheets.

To detect if I was in an iFrame, I simply checked whether (frames.top == frames.self). If it was, I turned on the alternate stylesheet.

3) Facebook only sends fb params on the first call to your app

This is actually not a problem if you use FBML. This is also not a problem if you're using iFrames, and you require a user to install your facebook app if they want to see what's on it.

However, even though this is how a lot of facebook apps operate, I don't think this is very user friendly. The user has no way to judge whether they want to install your app or not if they can't even sample it. I would rather have a user add an app because they want to, rather than getting people that add it, but then remove it shortly after. This not only gives you an inaccurate indication of how many people really want to use your app, but also annoys the hell out of them.

But making some pages of an iFrame app to be public is a bit tricky. Only the first click into your facebook app is there fb_params in the request. Every subsequent click by a user is in your iFrame, so looks as if the user is actually on the external webpage.

There are a couple solutions, but I ended up storing session state that the user made a request from an internal app before. You can't override params on subsequent requests, so using old fb_params to authenticate is difficult at best. Using the flag that a user made a request before, this session is likely to be coming from an internal facebook app. When it comes upon a private page, it should be redirected to install mobtropolis, using 1) detailed above. This is not a perfect solution, but it covers all cases correctly.

This, however, doesn't account for the instance where a user that already installed. In that particular case, I just went ahead an got a facebook session on every first request to the facebook app.

Hope that helped, and I hope never to have to mess with this sort of stuff again, and that you don't either. More interesting posts in the future. Tip!

Friday, March 28, 2008

A way to think about design for the naïve hacker

Any technology goes through its phases. First, there's the discovery of what it is, and along with it, the implementation. Just actually getting it to work is exciting. and at this stage, obviously usability is really a second thought. It's really hard enough getting it running in the first place, because of all the details you have to juggle as an innovative maker.

As a result, we get cool things that are hard to use. The first washing machines in the early 1800's didn't actually have a plug, because there were no wall sockets. Where'd you plug it in then? Your light socket hanging from the ceiling. That's right. You'd have to unscrew your lightbulb, and then screw in your washing machine--all the while on a step ladder. If it went haywire, you'd have a heck of a time unplugging it, and that probably wasn't considered very user friendly.

Then there's the phase where we've got a handle on how to build it, and the question becomes, how do we make it easy to use and work well? This is the part where design comes in. In large part, we've become specialized in what we do. The innovative builders make things that weren't yet possible possible, and then designers come along and create and experience around it. Stereotypically, hackers scoff at design. "It's just icing on the cake!" one might say. To that, I say, to scoff at design as a hacker is to scoff at implementation as a theorist--both lack appreciation of what's required.

I don't think it hurts for a hacker to know more about design, and how to think about it. While you may never be as good at putting on the gloss as a professional, you'll be better able to bridge the gap, which will help you be a better hacker. Besides, it will help you when you design APIs or public interfaces to your classes.

On the other hand, you might not need much convincing as a hacker if you are a fan of Apple's products. The Macbook Air, the iPod, the iPhone are widely touted as the stunning examples of design in technology. However, as a hacker, it seems like a strange and touchy-feely territory that relies on "taste" and "intuition". It doesn't have to be that way, as I think there's a good way to approach design, if you can think about it in the right way.

When people think "design", they usually think of superfluous, yet oddly satisfying lines that swoop or swoosh. They think of chrome, reflection, or shiny surfaces. Especially gradients. whoo. Lots of smart people in the actual design world have written entire books on "What is design?". I've never read any of them. However, I don't think they would do a good job of explaining it in a way that's easier for hackers to think about.

While there is inherent joy in design for its own sake, I'm going to only tackle a way to think about design for products. Web products, specifically.

I think of design as a deliberate attention to detail about the exposed public interface, to do two things: 1) to communicate to your user what your product is, and who your user is to others. 2) to control a user's experience

If you have no idea what your product is yet, or is still figuring it out, then there's no point in doing too much design. Design rests very much on what your product is, how it will benefit your user. If you haven't figured that out yet, ignore all the comments about how ugly your web app is, and figure out how to make it something that works, or something that people want to use because it's useful, because remember, design isn't about gradients.

The first part of what I think of as design is communication, usually with little to no words. It answers questions like, "What is it?" "How will it benefit me?" While the questions are deceptively simple, if new users can't tell immediately what it is, and how they'll benefit, they'll move on quickly. This is something I'm still working on, since people, no matter who they are, have limited time and attention to pay to any new thing. Pretend like you're talking to your really smart uncle, that is drunk all the time: Get to the simple point, and make it obvious.

Beyond the first impression, design of the product has to communicate to the user all the time, to answer questions such as, "How do I do [something I want to do]?" or "Can I do [something I wish I could do?" This is why designers talk about affordances of buttons, interfaces, and how the nipple is the only intuitive interface.

Last major point on communication, design should allow the product to communicate to the user what the heck it's doing at any one moment that the user would care about at that moment. If you're at a restaurant, you'd like your waitress to tell you that the food will be arriving in 10 mins, but you don't care what kind of pot the chef is using.

This sort of emphasis is captured in Don Norman's book on Design of Everyday Things, where he goes into detail ad nauseum. I recommend you take a look at it.

The second part, which is the result of communication, is controlling the user experience. Of course, the phrase, "user experience" is throw around a lot, and people don't much say what it is either. One aspect is how a user feels during and after using your product. Do they feel happy? Confident? Like they're having fun? Or do they feel frustrated? Powerless? Incompetent? Or wasting time?

Seems like you can't control whether someone's having a bad day or not, but there are certain tricks that designers employ to conjure up feelings. This works because our brains do a lot in interpreting colors and shapes in our culture. If you see a felt red with a holly green, you'll automatically think of Christmas, and perhaps your feelings that go with Christmas.. Change the tint slightly, and you can be reminded of traffic lights. The designer doesn't actually have to do very much, other than to suggest a mood, and the brain will do the rest. It's a good hack.

Lastly, part of the user experience is also what using the product says about them to their peers. This differs from peer group to peer groups, but they all have the same goal: everyone wants to have their product that brings them higher status in their peer group. This can be achieved by catering to core values of the peer group. If the peer group of your users are business people, they'll likely to value efficiency, reliability, and effectiveness. If the peer group of your users are hackers, they'll likely to value ability, interestingness, and functionality. If the design of your product can communicate that a user is more [insert core values] in their peer group, they'll probably appreciate the product without knowing why.

One word of caution is that it's hard to identify the core values of a group if you're not a part of it. And you can't start with an associated design and work your way back to the core values of a peer group. This is why a lot of people, when designing for girls, automatically start throwing pink everywhere, and then when the design fails, they wonder where their design went wrong. If your design doesn't communicate first what it does for a user, and second who a user is, then it's going to fail, no matter how much pink you put on there.

I recently revamped the layout and the look and feel of mobtropolis, which is why there's this post, and the two week hiatus from posting. You can see a "before" and an "after".





Before (v0.2)



After (v0.6)



While I can't claim to be a designer by any means, I feel that I was fairly successful in changing the direction of the app as well as invoking a better user experience. Communication will be an iterative process, as there will need to be some back and forth with users, before things get completely resolved. Just going through it gave me some thoughts on the matter, and next time, I'll talk about the specific lessons I've learned, and maybe some things to help a hacker or two do basic design. Til then!

Monday, March 24, 2008

The twenty-seven word score club

Like lots of people on facebook, I've been playing scrabulous on facebook. I'm not much of a wordsmith, but I have fun playing people. Justin told me that his goal in life was to span two triple-word scores, to get a 9x word score. So not to be outdone, I wondered what words would be able to give you a 27-word score if you spanned across all three triple-word scores. We would need fifteen lettered words.

Since you can only put down at most seven tiles per turn, there needs to be a word in-between the triple letter scores to help you fill it out. These "bridge words" can't already be on the triple word score already, and they must be between two and six letters long on each side, where the total length of both words has to be greater than eight.

So I wrote a program in a couple hours to find them. I did take into account whether a word was possible to make based on the scrabble tile distribution, as well as taking into account blanks. There's 286 of them thus far in the TWL scrabble dictionary. I didn't find ones that used more than one bridge word on a side. The points aren't completely accurate either.

The first number is the points you'd get, and then the two bridge words. Based simply on the probability of drawing the numbers from a full bag, "irrationalities" is the most likely word. (in reality, this never happens, since you need to draw tile in order to place those words to reach the side.)

459 : irrationalistic : ["ratio", "alist"]

You can score a whopping 459 points with it. The word that has the biggest word score is "coenzymatically"

972 : coenzymatically : ["enzym", "tical"]

Yes. "tical" is a word.

ti·cal –noun, plural
1. a former silver coin and monetary unit of Siam, equal to 100 satang: replaced in 1928 by the baht.


There are quite a number of common words, you wouldn't think, as well as quite a number odd ones. As a note, the point scores aren't exactly accurate. I didn't take into account the double letter scores that might occur if you place a letter one it. But given that the multiplier is 27 here, and I picked the longest bridge words (which usually cover the double letter score), it shouldn't affect it too much. I had held off posting it until I fixed that, but this was sort of a one off amusement and curiosity, rather than anything significant, so I figured I'd just post it. Enjoy!

567 : accountableness : ["count", "lenes"]
648 : accountantships : ["count", "ship"]
486 : administrations : ["minis", "ration"]
648 : ammonifications : ["mon", "cation"]
594 : amorphousnesses : ["morpho", "ness"]
621 : anthropological : ["thro", "logic"]
783 : anthropomorphic : ["thro", "morph"]
594 : antihistaminics : ["his", "aminic"]
540 : antitheoretical : ["tithe", "etic"]
594 : aromatherapists : ["math", "rapist"]
540 : astronautically : ["trona", "tical"]
756 : astrophysically : ["strop", "call"]
675 : astrophysicists : ["strop", "cist"]
540 : atheroscleroses : ["heros", "erose"]
540 : atherosclerosis : ["heros", "eros"]
594 : atherosclerotic : ["heros", "roti"]
540 : authentications : ["then", "cation"]
594 : autoradiographs : ["tora", "graph"]
675 : autoradiography : ["tora", "graph"]
810 : bathymetrically : ["thyme", "call"]
594 : beautifications : ["eau", "cation"]
594 : benightednesses : ["night", "ness"]
675 : blameworthiness : ["lame", "thine"]
540 : brotherlinesses : ["other", "ness"]
486 : businesspersons : ["sines", "person"]
405 : ceaselessnesses : ["easel", "ness"]
729 : chancellorships : ["hance", "ship"]
729 : chemotherapists : ["moth", "rapist"]
810 : cholangiography : ["lang", "graph"]
675 : cholecystitises : ["hole", "titis"]
675 : cholestyramines : ["holes", "amine"]
540 : cholinesterases : ["line", "erase"]
675 : cinematographer : ["nema", "graph"]
729 : cinematographic : ["nema", "graph"]
486 : clandestineness : ["land", "nenes"]
594 : classifications : ["lassi", "cation"]
972 : coenzymatically : ["enzym", "tical"]
648 : comfortableness : ["fort", "lenes"]
567 : commensurations : ["men", "ration"]
594 : commercialistic : ["merc", "alist"]
702 : communistically : ["muni", "tical"]
756 : computerphobias : ["put", "phobia"]
648 : conceivableness : ["once", "lenes"]
567 : concelebrations : ["once", "ration"]
540 : conceptualistic : ["once", "alist"]
540 : conglomerations : ["glom", "ration"]
486 : conglutinations : ["glut", "nation"]
540 : congresspersons : ["res", "person"]
486 : considerateness : ["onside", "ate"]
540 : containerboards : ["tain", "board"]
594 : convertibleness : ["vert", "lenes"]
702 : crashworthiness : ["rash", "thine"]
594 : crotchetinesses : ["rotche", "ness"]
513 : customarinesses : ["stoma", "ness"]
459 : dangerousnesses : ["anger", "ness"]
837 : decarboxylating : ["carbo", "lati"]
810 : decarboxylation : ["carbo", "lati"]
540 : deconcentration : ["once", "ratio"]
540 : deconsecrations : ["cons", "ration"]
621 : dedifferentiate : ["diff", "entia"]
513 : defenestrations : ["fen", "ration"]
513 : delegitimations : ["elegit", "mat"]
567 : delightednesses : ["light", "ness"]
864 : demisemiquavers : ["mise", "quaver"]
810 : denazifications : ["nazi", "cation"]
567 : dialectological : ["alec", "logic"]
486 : diastereoisomer : ["aster", "some"]
648 : dichloroethanes : ["ich", "ethane"]
540 : discontinuances : ["con", "nuance"]
540 : discriminations : ["scrim", "nation"]
486 : disinclinations : ["sin", "nation"]
459 : disintegrations : ["sin", "ration"]
648 : dissatisfactory : ["sati", "factor"]
567 : divertissements : ["vert", "semen"]
675 : dyslogistically : ["slog", "tical"]
567 : eclaircissement : ["lair", "semen"]
486 : educationalists : ["ducat", "alist"]
459 : elaboratenesses : ["labor", "ness"]
459 : emotionlessness : ["motion", "ess"]
756 : encephalographs : ["epha", "graph"]
837 : encephalography : ["epha", "graph"]
675 : enfranchisement : ["franc", "semen"]
621 : epidemiological : ["idem", "logic"]
594 : epistemological : ["piste", "logic"]
540 : epistemologists : ["piste", "gist"]
378 : essentialnesses : ["senti", "ness"]
540 : esterifications : ["rif", "cation"]
594 : eutrophications : ["trop", "cation"]
702 : excrementitious : ["creme", "titi"]
621 : exsanguinations : ["sang", "nation"]
702 : extemporisation : ["tempo", "sati"]
540 : fantastications : ["antas", "cation"]
567 : flibbertigibbet : ["libber", "gib"]
513 : foreordinations : ["ore", "nation"]
567 : fragmentariness : ["ragmen", "rin"]
675 : frightfulnesses : ["right", "ness"]
567 : fundamentalists : ["dame", "alist"]
567 : gentrifications : ["rif", "cation"]
513 : gluconeogeneses : ["cone", "genes"]
513 : gluconeogenesis : ["cone", "genes"]
675 : grotesquenesses : ["rotes", "ness"]
648 : historiographer : ["tori", "graph"]
702 : historiographic : ["tori", "graph"]
432 : houselessnesses : ["ousel", "ness"]
702 : humidifications : ["midi", "cation"]
783 : hypervigilances : ["perv", "lance"]
756 : hypnotherapists : ["not", "rapist"]
567 : identifications : ["dent", "cation"]
459 : illiberalnesses : ["liber", "ness"]
513 : illimitableness : ["limit", "lenes"]
486 : illogicalnesses : ["logic", "ness"]
513 : immaterialities : ["mater", "alit"]
540 : immediatenesses : ["media", "ness"]
459 : inalterableness : ["alter", "lenes"]
459 : inanimatenesses : ["anima", "ness"]
702 : inconsequential : ["cons", "entia"]
567 : inconsiderately : ["cons", "ratel"]
486 : inconsideration : ["cons", "ratio"]
486 : incoordinations : ["coo", "nation"]
567 : incrementalisms : ["creme", "tali"]
513 : incrementalists : ["creme", "alist"]
459 : incuriousnesses : ["curio", "ness"]
567 : indefinableness : ["defi", "lenes"]
486 : indoctrinations : ["doc", "nation"]
540 : indomitableness : ["omit", "lenes"]
648 : inflammableness : ["flam", "lenes"]
513 : instrumentalism : ["strum", "tali"]
459 : instrumentalist : ["strum", "tali"]
540 : instrumentality : ["strum", "tali"]
459 : intolerableness : ["tole", "lenes"]
486 : inviolatenesses : ["viola", "ness"]
459 : irrationalistic : ["ratio", "alist"]
405 : irrationalities : ["ratio", "alit"]
567 : irreconcilables : ["recon", "able"]
729 : kinesthetically : ["nest", "tical"]
648 : lickerishnesses : ["icker", "ness"]
540 : loathsomenesses : ["oaths", "ness"]
621 : magistratically : ["agist", "tical"]
594 : martensitically : ["tens", "tical"]
540 : masterfulnesses : ["aster", "ness"]
621 : mercaptopurines : ["cap", "purine"]
621 : metallographers : ["tall", "raphe"]
702 : methamphetamine : ["eth", "etamin"]
891 : methoxyfluranes : ["ethoxy", "ran"]
729 : methylmercuries : ["ethyl", "curie"]
891 : methylxanthines : ["ethyl", "thine"]
648 : microanalytical : ["roan", "lytic"]
621 : misapplications : ["sap", "cation"]
513 : momentarinesses : ["omenta", "ness"]
486 : monounsaturated : ["nouns", "urate"]
459 : monounsaturates : ["nouns", "urate"]
513 : monumentalities : ["numen", "alit"]
567 : multiplications : ["tip", "cation"]
540 : neurobiological : ["euro", "logic"]
513 : neuroblastomata : ["euro", "stoma"]
783 : neuropsychology : ["euro", "cholo"]
513 : nonbarbiturates : ["barb", "urate"]
486 : nonbelligerents : ["bell", "gerent"]
513 : noncelebrations : ["once", "ration"]
513 : noncooperations : ["coop", "ration"]
594 : nonenforcements : ["one", "cement"]
567 : nonimplications : ["nim", "cation"]
756 : nonphotographic : ["phot", "graph"]
486 : opinionatedness : ["pinion", "ted"]
729 : overextractions : ["ere", "action"]
621 : overmedications : ["med", "cation"]
486 : oversaturations : ["ers", "ration"]
567 : overstimulating : ["verst", "lati"]
540 : overstimulation : ["verst", "lati"]
864 : oxytetracycline : ["tet", "cyclin"]
459 : painterlinesses : ["inter", "ness"]
621 : paragenetically : ["rage", "tical"]
675 : parenthetically : ["rent", "tical"]
648 : parthenocarpies : ["then", "carpi"]
567 : parthenogeneses : ["then", "genes"]
567 : parthenogenesis : ["then", "genes"]
621 : parthenogenetic : ["then", "genet"]
513 : pectinesterases : ["tine", "erase"]
567 : permissibleness : ["miss", "lenes"]
702 : pharmaceuticals : ["harm", "tical"]
729 : pharmacological : ["harm", "logic"]
648 : phenomenalistic : ["nome", "alist"]
675 : photoengravings : ["hot", "raving"]
675 : photoperiodisms : ["tope", "iodism"]
783 : phototelegraphy : ["tote", "graph"]
729 : pithecanthropus : ["theca", "thro"]
648 : planimetrically : ["anime", "call"]
621 : platinocyanides : ["latino", "nide"]
513 : pleasurableness : ["leas", "lenes"]
486 : predestinarians : ["redes", "aria"]
486 : predestinations : ["redes", "nation"]
621 : preestablishing : ["reest", "shin"]
648 : prefabrications : ["ref", "cation"]
594 : preformationist : ["reform", "ion"]
540 : preponderations : ["repo", "ration"]
621 : prepublications : ["rep", "cation"]
513 : presentableness : ["resent", "lenes"]
513 : preterminations : ["rete", "nation"]
594 : prettifications : ["ret", "cation"]
702 : problematically : ["roble", "tical"]
486 : proletarianised : ["role", "anise"]
459 : proletarianises : ["role", "anise"]
675 : proteolytically : ["rote", "tical"]
783 : quantifications : ["anti", "cation"]
729 : rechoreographed : ["chore", "graph"]
621 : recodifications : ["cod", "cation"]
513 : reconcentration : ["once", "ratio"]
513 : reconsecrations : ["cons", "ration"]
486 : reconsideration : ["cons", "ratio"]
459 : redintegrations : ["dint", "ration"]
567 : reductivenesses : ["educt", "ness"]
567 : refrangibleness : ["rang", "lenes"]
513 : regretfulnesses : ["egret", "ness"]
513 : reinvigorations : ["vig", "ration"]
432 : reregistrations : ["egis", "ration"]
567 : respectableness : ["spec", "lenes"]
513 : responsibleness : ["pons", "lenes"]
459 : retroperitoneal : ["trope", "tone"]
540 : retroreflectors : ["ore", "lector"]
594 : rigidifications : ["gid", "cation"]
513 : sacramentalists : ["cram", "alist"]
594 : saponifications : ["apo", "cation"]
810 : saprophytically : ["prop", "tical"]
513 : scintillometers : ["inti", "meter"]
567 : seductivenesses : ["educt", "ness"]
540 : selectivenesses : ["elect", "ness"]
459 : semiterrestrial : ["miter", "stria"]
513 : semitransparent : ["emit", "spare"]
405 : sensationalists : ["sati", "alist"]
459 : sentimentalists : ["time", "alist"]
594 : serviceableness : ["vice", "lenes"]
648 : simplifications : ["imp", "cation"]
594 : slaughterhouses : ["laugh", "house"]
459 : snippersnappers : ["nipper", "napper"]
567 : solidifications : ["lid", "cation"]
594 : solipsistically : ["lips", "tical"]
594 : sophistications : ["phis", "cation"]
540 : spermatogeneses : ["perm", "genes"]
540 : spermatogenesis : ["perm", "genes"]
648 : spinthariscopes : ["pint", "scope"]
567 : sprightlinesses : ["right", "ness"]
540 : stadtholderates : ["tad", "derate"]
864 : straightjackets : ["rai", "jacket"]
540 : stratifications : ["rat", "cation"]
540 : stratovolcanoes : ["rato", "canoe"]
567 : strikebreakings : ["trike", "akin"]
648 : superphenomenon : ["perp", "nomen"]
810 : sympathetically : ["path", "tical"]
351 : tastelessnesses : ["stele", "ness"]
621 : teletypewriters : ["let", "writer"]
567 : thanklessnesses : ["ankle", "ness"]
675 : therapeutically : ["rape", "tical"]
675 : thunderstricken : ["under", "trick"]
432 : toastmistresses : ["oast", "tress"]
432 : traditionalists : ["adit", "alist"]
459 : transaminations : ["ansa", "nation"]
486 : transmigrations : ["ran", "ration"]
621 : trihalomethanes : ["halo", "ethane"]
540 : troubleshooters : ["rouble", "hooter"]
567 : troubleshooting : ["rouble", "hoot"]
513 : troublesomeness : ["rouble", "omen"]
567 : trustworthiness : ["rust", "thine"]
540 : unadulteratedly : ["adult", "rated"]
459 : unalterableness : ["alter", "lenes"]
621 : unanticipatedly : ["antic", "pated"]
513 : unboundednesses : ["bound", "ness"]
729 : unchoreographed : ["chore", "graph"]
459 : uncleanlinesses : ["clean", "ness"]
621 : unclimbableness : ["climb", "lenes"]
702 : uncomplimentary : ["comp", "menta"]
513 : underestimating : ["dere", "matin"]
486 : unearthlinesses : ["earth", "ness"]
702 : unextraordinary : ["extra", "dinar"]
621 : unfavorableness : ["favor", "lenes"]
486 : unguardednesses : ["guard", "ness"]
540 : unrealistically : ["real", "tical"]
513 : unsightlinesses : ["sight", "ness"]
513 : unworldlinesses : ["world", "ness"]
837 : wappenschawings : ["pens", "hawing"]
540 : warrantableness : ["arrant", "lenes"]
486 : westernisations : ["ester", "sati"]
810 : whatchamacallit : ["hatch", "call"]
621 : whippersnappers : ["hipper", "napper"]
621 : wholesomenesses : ["holes", "ness"]
540 : worrisomenesses : ["orris", "ness"]
864 : xeroradiography : ["orad", "graph"]

Saturday, March 22, 2008

What do you take away from it?

This morning, I woke up and read this particular piece from coding horror, the well-known blog about software engineering. Normally, people talking about each others' essays doesn't hold much interest for me to make a comment on. The recent ones that come to mind are Zed Shaw's Rails is a Ghetto and Clifford Heath's Monkeypatching is destroying Ruby. Even if they bring up the finer points of a subject, it often feels like TMZ. So even if I hear about it, I go back to coding (which is why you haven't seen me here).

What prompted me however, is a re-evaluation of the essay--as Jeff Atwood's post made me go back and read Paul Graham's essay to rethink it. In the end, I didn't think his latest essay was the best he's written before, but I don't think his point was to say, "hey you suck ass because you're an employee".

Like people that care about their trade, Paul Graham has a certain philosophy on programmers. Just as the martial arts have different schools of thought of major guiding principles, Paul to has his own school of thought when it comes to programming. As far as I can tell, he's mostly concerned with hackers, which aren't just people that write code, but people with an attitude of subverting the norm and a cultivated curiosity about the world, mainly expressed through programming. I think it's these types of people he's mainly trying to reach. If you're someone that's not like that, but enjoys programming as the way you make a living and don't think about it when you go home the essay simply doesn't apply to you.

That said, there are advantages to working at a big company. Aside for the usual concerns about steady paycheck and health insurance, you get a lot more information thrown your way casually by coworkers if you're paying attention. When you're at a small company, you have to make an effort to read up on industry news--though given how addictive social tech news is, we might not call it 'effort'.

In addition, there are some types of things that need large company resources to do, even though it's cheaper to start certain types of tech company. Bio tech comes to mind, as well as chip design.

Many times, companies are started because someone was at a big company and they had a good view of the industry and saw a particular need that was unfulfilled. If the founders weren't at the big company to begin with, they wouldn't have had the wide view of an industry as easily and they wouldn't have been motivated by their company's lack of interest in the niche to go and start their own.

I venture to guess that Paul writes these sorts of essays on hackers and startups mainly because 1) it's what he knows (and you write best when you write what you know) and 2) there's not enough material out there on it. If you take into account your friends, your parents, guidance counselors, etc, most everyone can tell you how to go get a job. However, not many people can write essays on doing a startup. I believe this particular essay is directed at reaching out to the hackers as described above, and not the general programming audience. In my mind, it's not a bad thing, because, again, you write what you know.

Update:
This comment and response to it is one of the better posts that I've read on there. It's pretty much on the money. I have an admiration for those that can cut to the chase with clarity in their writing.

Friday, March 07, 2008

Nine letter word riddle

It's not too common that I get forwards nowadays. With the advent of social news, all the stupid links have migrated there. But on occasion, I'll get one from the older crowd. This one was a riddle with a movie of the answer attached.
What common English word is 9 letters long, and each time you remove a letter from it, it still remains an English word... from 9 letters all the way down to a single remaining letter?
It was only one answer, however, which it gave as "startling". I ended up wondering if there were more than one, so I wanted to see how fast I could write something to give me the answer. It'd be good practice, since most web programming is design and architectural hard, rather than algorithms hard. Besides, it's been a while since I wrote a recursive function.

Embarrasingly, it took 2.5-3 hours. I thought I'd be able to knock it out in one. I had some problems first removing a single letter from a word. Ever since I came to Ruby, I hardly ever deal with indicies, so finding those methods took a bit of time. Then, the recursion is always a bit of a mind bender when you don't do it often.

I also spent some time looking up what were considered one letter words, but then found out that there's a whole dictionary of one letter words. So I only considered "i" and "a" as valid one letter words. I also threw out all contractions.

See if you can write shorter/faster/better code. It's certainly a better workout than fizzbuzz. Seeing how it took me a bit, and I didn't clean up the code, I set the bar pretty low to beat. There were other things that would optimize it. I didn't throw out shorter words to check in the dictionary if I already checked them in a longer word--ie. I just ran down the list of dictionary words. Try it out yourself, in whatever language. (This sounds like a good beginning problem to write in Arc.) Go ahead. I'll wait.


Got it?


Here's the list I came up with along with their chains. You'll notice that it's actually a tree that branches.

cleansers
[["cleanses", ["cleanse", ["cleans", ["clean", ["clan", ["can", ["an", ["a"]]]], ["lean", ["lea", ["la", ["a"]]]]], ["clans", ["clan", ["can", ["an", ["a"]]]], ["cans", ["can", ["an", ["a"]]]]], ["leans", ["lean", ["lea", ["la", ["a"]]]], ["leas", ["lea", ["la", ["a"]]]]]]]], ["cleanser", ["cleanse", ["cleans", ["clean", ["clan", ["can", ["an", ["a"]]]], ["lean", ["lea", ["la", ["a"]]]]], ["clans", ["clan", ["can", ["an", ["a"]]]], ["cans", ["can", ["an", ["a"]]]]], ["leans", ["lean", ["lea", ["la", ["a"]]]], ["leas", ["lea", ["la", ["a"]]]]]]]]]

discusses
[["discuses", ["discuss", ["discus", ["discs", ["disc", ["dis", ["is", ["i"]]]], ["diss", ["dis", ["is", ["i"]]]]]]]]]

drownings
[["drowning", ["downing", ["owning", ["owing", ["wing", ["win", ["in", ["i"]]]]]]]]]

paintings
[["painting", ["paining", ["pining", ["piing", ["ping", ["pin", ["pi", ["i"]], ["in", ["i"]]], ["pig", ["pi", ["i"]]]]]]]]]

piercings
[["piercing", ["piecing", ["pieing", ["piing", ["ping", ["pin", ["pi", ["i"]], ["in", ["i"]]], ["pig", ["pi", ["i"]]]]]]]]]

prickling
[["pickling", ["picking", ["piking", ["piing", ["ping", ["pin", ["pi", ["i"]], ["in", ["i"]]], ["pig", ["pi", ["i"]]]]]]]], ["pricking", ["picking", ["piking", ["piing", ["ping", ["pin", ["pi", ["i"]], ["in", ["i"]]], ["pig", ["pi", ["i"]]]]]]]]]

restarted
[["restated", ["restate", ["estate", ["state", ["sate", ["sat", ["at", ["a"]]], ["ate", ["at", ["a"]]]]]]]]]

scrapping
[["crapping", ["rapping", ["raping", ["aping", ["ping", ["pin", ["pi", ["i"]], ["in", ["i"]]], ["pig", ["pi", ["i"]]]]]]]]]

sparkling
[["sparking", ["sparing", ["spring", ["sprig", ["prig", ["pig", ["pi", ["i"]]]]]]]]]

splatters
[["platters", ["platter", ["latter", ["later", ["late", ["ate", ["at", ["a"]]]]]]]], ["splatter", ["platter", ["latter", ["later", ["late", ["ate", ["at", ["a"]]]]]]]]]

splitting
[["slitting", ["sitting", ["siting", ["sting", ["sing", ["sin", ["in", ["i"]]]], ["ting", ["tin", ["ti", ["i"]], ["in", ["i"]]]]]]]], ["spitting", ["spiting", ["siting", ["sting", ["sing", ["sin", ["in", ["i"]]]], ["ting", ["tin", ["ti", ["i"]], ["in", ["i"]]]]]]], ["sitting", ["siting", ["sting", ["sing", ["sin", ["in", ["i"]]]], ["ting", ["tin", ["ti", ["i"]], ["in", ["i"]]]]]]]]]

stampeded
[["stampede", ["stamped", ["tamped", ["tamed", ["tame", ["tam", ["am", ["a"]]]]]]]]]

stampedes
[["stampede", ["stamped", ["tamped", ["tamed", ["tame", ["tam", ["am", ["a"]]]]]]]]]

startling
[["starling", ["staring", ["string", ["sting", ["sing", ["sin", ["in", ["i"]]]], ["ting", ["tin", ["ti", ["i"]], ["in", ["i"]]]]]]]], ["starting", ["staring", ["string", ["sting", ["sing", ["sin", ["in", ["i"]]]], ["ting", ["tin", ["ti", ["i"]], ["in", ["i"]]]]]]], ["stating", ["sating", ["sting", ["sing", ["sin", ["in", ["i"]]]], ["ting", ["tin", ["ti", ["i"]], ["in", ["i"]]]]]]]]]

starvings
[["starving", ["staring", ["string", ["sting", ["sing", ["sin", ["in", ["i"]]]], ["ting", ["tin", ["ti", ["i"]], ["in", ["i"]]]]]]]]]

strapping
[["trapping", ["tapping", ["taping", ["aping", ["ping", ["pin", ["pi", ["i"]], ["in", ["i"]]], ["pig", ["pi", ["i"]]]]]]], ["rapping", ["raping", ["aping", ["ping", ["pin", ["pi", ["i"]], ["in", ["i"]]], ["pig", ["pi", ["i"]]]]]]]]]

stringers
[["stingers", ["stinger", ["singer", ["singe", ["sing", ["sin", ["in", ["i"]]]], ["sine", ["sin", ["in", ["i"]]]]]]], ["singers", ["singer", ["singe", ["sing", ["sin", ["in", ["i"]]]], ["sine", ["sin", ["in", ["i"]]]]]], ["singes", ["singe", ["sing", ["sin", ["in", ["i"]]]], ["sine", ["sin", ["in", ["i"]]]]], ["sings", ["sing", ["sin", ["in", ["i"]]]], ["sins", ["sin", ["in", ["i"]]], ["sis", ["is", ["i"]]], ["ins", ["in", ["i"]], ["is", ["i"]]]]]]]], ["stringer", ["stinger", ["singer", ["singe", ["sing", ["sin", ["in", ["i"]]]], ["sine", ["sin", ["in", ["i"]]]]]]]]]

stringier
[["stingier", ["stinger", ["singer", ["singe", ["sing", ["sin", ["in", ["i"]]]], ["sine", ["sin", ["in", ["i"]]]]]]]], ["stringer", ["stinger", ["singer", ["singe", ["sing", ["sin", ["in", ["i"]]]], ["sine", ["sin", ["in", ["i"]]]]]]]]]

trampling
[["tramping", ["tamping", ["taping", ["aping", ["ping", ["pin", ["pi", ["i"]], ["in", ["i"]]], ["pig", ["pi", ["i"]]]]]], ["amping", ["aping", ["ping", ["pin", ["pi", ["i"]], ["in", ["i"]]], ["pig", ["pi", ["i"]]]]]]]]]

trappings
[["trapping", ["tapping", ["taping", ["aping", ["ping", ["pin", ["pi", ["i"]], ["in", ["i"]]], ["pig", ["pi", ["i"]]]]]]], ["rapping", ["raping", ["aping", ["ping", ["pin", ["pi", ["i"]], ["in", ["i"]]], ["pig", ["pi", ["i"]]]]]]]]]

whittlers
[["whittler", ["whitter", ["whiter", ["white", ["whit", ["wit", ["it", ["i"]]], ["hit", ["hi", ["i"]], ["it", ["i"]]]], ["wite", ["wit", ["it", ["i"]]]]]]]]]

wrappings
[["wrapping", ["rapping", ["raping", ["aping", ["ping", ["pin", ["pi", ["i"]], ["in", ["i"]]], ["pig", ["pi", ["i"]]]]]]]]]


And here's my code:


#!/usr/bin/ruby

class Array
def one_less
total = [self[1..-1]]
self.each_with_index do |e, i|
total << self.values_at(0..i, (i+2)..-1)
end
return total[0..-2]
end
end

def sub_words(word)
result = word.split("").one_less.map { |word_array| word_array.join }.uniq
result == [""] ? [] : result
end

def find_sub_word_chain(word)
return nil unless @@dict.include?(word)
return [word] if word.length == 1 && @@dict.include?(word)
valid_sub_words = sub_words(word).reject { |w| !@@dict.include?(w) }
word_chain = valid_sub_words.map do |sub_word|
chain = find_sub_word_chain(sub_word)
if chain.nil?
nil
else
if sub_word.length == 1
chain
else
(chain << sub_word).reverse
end
end
end.compact
word_chain.empty? ? nil : word_chain
end

@@dict = {}
ARGV[0] ||= "/etc/dictionaries-common/words"
words = File.readlines(ARGV[0]).map { |w| w.chomp }
words.each do |word|
if word.length == 1
next unless ["a", "i"].include?(word)
end
@@dict[word] = word
end

words.reject { |e| e.length != 9 }.reject { |word| word =~ /'/ }.map do |word|
chain = find_sub_word_chain(word)
next if chain.nil?
puts word
puts chain.inspect
puts
end