Sunday, February 18, 2007

each vs. inject vs. map

Despite working in Matlab for a fair amount of time, where you have to think in terms of matrix operations, it's still hard to shake the "loop it through" kind of thinking when dealing with collections of things from a C heritage. So say I have posts with comments, and I want to get all the comments of all posts.

all_comments = []
posts.each { |post| all_comments += post.comments }

This was the way that I use to do it using a more C-like thinking. I really never liked doing it this way because you have that floating initialization with all_comments, and it can get separated from the actual loop when you have all sorts of stuff doing on. Then I found inject:

posts.inject([]) { |all_comments, post| all_comments + post.comments }

This code does the same thing, pretty much with the initialization in the loop itself. I liked it a lot better. However, map has its uses:

posts.map { |post| post.comments }.flatten

I'm not sure which one is faster on my machine, but the last one has an appeal in that a "map" operation tells me that this piece of code can be done in parallel. Given the way it's written, semantically it means that every piece in the collection can be independently calculated from each other, and then put together at the end (with flatten)--regardless of how it's actually implemented right now. Not that this isn't true for the first two code pieces, however, "inject" and "each" does not immediately imply that it can be parallelized.

I know that compilers nowadays are pretty sophisticated, with pipelining and all. But having a programmer use "map" could only be a help to the compiler figure out which part of the code parallelizes, no?

Update: I found another post talking about closures in Haskell, since the java people are resisting closures in Java. It gives a better argument over why a for loop is no good.

No comments:

Post a Comment