Friday, February 08, 2008

Erlang Advocacy and the class of problems it solves

I spend more time than I should reading hacker.news. Granted, I sometimes feel like it's the US Weekly or Maxim of the tech world, when stories like "5 ways you know you're failing at your start up" and the general hubbub over Facebook back in June or the recent Microsoft and Yahoo buyout. However, the quality of the comments there is generally high, and on occasion, I'll start replying to a comment, and before I know it, two hours have passed, and it's better off as a blog post.

The challenge of the Erlang advocate is not to convince me, over and over, that Erlang wins its class; the challenge is to convince me that Erlang's class of problem is so important to my life that I should study Erlang rather than vascular surgery, or television repair, or other obscure technical skills that I don't know that much about.


The follow-up question then would be how many existing problems can be converted to the type that Erlang can solve well? And how many problems previously impractical are now practical to solve? So if it ends up to be a bigger class than you thought, you may well be limiting the number problems that you can solve easily and practically out of the ones that will be important in the future.

I like Erlang (aside from some syntax ugliness), so I'll give it a shot. I think it's important because it allows a program to easier to scale out rather than scale up. If it was running an algorithm that are parallelize-able, then you can just technically add more cheap processors to it for a speed up, rather than designing a faster processor. We'd want speedups in this way because CPUs are becoming multi-cores, and to take advantage of them one will have to write some type of parallel program, since it's proven difficult to fully automate parallelizing serial programs. In addition, with bandwidth pipes getting bigger and the internet more and more pervasive, it is possible that you can leverage other computers you don't own (but given permission) for computational or storage resources in the future most of whom don't belong to a single entity. (think more SETI@home than Amazon)

In addition, parallelized systems (not just erlang) can be more fault tolerant and can fail more gracefully (or hobble along, if you'd like to call it that). Sensor networks are one example. Instead of a single radar to detect the environment, you throw a bunch of sensors out there, and they network themselves and report what they see/hear back to you. If some of them gets destroyed that's ok, because the system's still functioning with less sensors.

A swarm of UVAs doing surveillance in an area is another example. If you have a single computer commanding all the UVAs, it's actually quite hard, because you don't want them to crash into each other, so that's N^2 comparisons (less if you do oct-trees, probably). And if a target comes into the area, it becomes a non-trivial allocation problem: how do you decide which UVA to assign to monitoring it, and when do you switch them out when their fuel runs low? It ends up that doing it in with an actor model, where each UVA decides what to do at any given moment (local interactions) might not be optimal, but it's redundant and fault-tolerant.

Biological systems work this way as well. Gene expression is actually a network of genes being turned on and off by proteins expressed by other genes being turned on and off recursively. If one gene can't activate another, it might not always be detrimental, because there might be other pathways to express that gene. Since I'm not as familiar with the intricacies of biological systems, I'll refrain from saying anymore.

I was surprised when I was watching a video of Alan Kay talking about OOP that the OOP C++/Java I learned in college wasn't what he had in mind. Rather, he meant OOP to be more like biological systems and more process orientated. Encapsulation was only meant so that objects (analogous to actors in erlang) would have to pass messages to each other (method calls).

So what concrete examples of problems fall into this class Erlang is good at? The obvious ones are the embarrassingly parallelized algorithms, like genetic algorithms, neural networks, 3d rendering, and if I'm not mistaken, FFTs. Indexing web pages is another. But then again there are other algorithms that are inherently serial like protocol handshaking or newton's method.

I don't know what the ratio is between embarrassingly parallel and Serial problems are, but my gut is that with the advent of multi-cores and availability of the internet, I think there will be plenty of parallel problems to go around.

Ruby and Haxe language writers are both implementing the actor model like Erlang, if that's any indication of how important they think it is. While I don't think Erlang will be the 100 year language, the ideas by which it's a poster child will reverberate in the descendant languages for a long time to come.

No comments:

Post a Comment