Wednesday, February 14, 2007

For the love of programmers, we need better concurrency abstractions

Lately, I've been pretty interested in parallelism. Processors are moving to multi-core architectures. And while I expect that computers will keep following Moore's Law for a while more, I think that there's a lot to be gained for figuring how to best make use of multiple processors, especially for the tasks that can be easily parallelized, such as 3D graphics, image processing, and certain artificial intelligence algorithms. If compilers and subsequently programmers can't take advantage of these multiple processors, we won't see a performance gain in future software.

However, in terms of programming for multiple processors, the general consensus among programmers is, "avoid it if you can get away with it." Multi-threaded programming has generally been considered hard, and with good reason. It's not easy to think about multiple threads running the same code all at the same time at different points, and the side effects that it will have. Synchronization and mutex locks don't make for an easy abstraction that works well as the code base gets larger.

One of the ways that people have been able to get around it is to reduce the amount of sharing of data that different threads and processes needs to have. Sometimes, this is enforced by a no side-effects policy in functional programming, and other times, it's by the use of algorithms that are by nature share nothing. Google's MapReduce seems to be a good example of this.

But there are some programs and algorithms that require the sharing of data, multithreaded programming for shared data is in some sense, unavoidable. Since that's what we're introduced with as THE thing for multi-threaded programming, that's all I knew for a long while. Therefore, I started to wonder, is the current concurrent programming abstraction with synchronization of threads and mutexes the only one that exists?

Apparently not. Like all good ideas in CS, they seemed to have all come from the 1960's. However, there here are futures, software transactional memory, actors, and joins (scroll down to concurrency). The post from Moonbase gives a probable syntax for these abstractions in ruby--they don't exist yet, but he's thinking what it might look like. I'm excited about this, if it makes programming multi-threaded applications easier. That way, programmers can more easily exploit multi-core processors or clusters for speed.

Most of the time, parallelism is exploited for speed, but I think parallelism can be also exploited for robustness. It's no secret to programmers that code is fairly brittle. A single component that isn't working correctly is a runtime bug for the entire system. I think parallelism can also be exploited to alleviate this problem, for a trade off of greater execution speed due to parallelism.

The only reason that I think this might be an interesting area to explore is because of the relatively recent interest in natural complex and emergent systems such as ants foraging for food, sugarscape, and termites gathering wood piles. A more technical example are the decentralized P2P technologies, as well as Bittorrent. This seems to be nothing new, as agent based modeling has been around for a while, in the form of genetic algorithms and ant algorithms. However, none of the current popular programming languages has good abstractions for it to exploit it as parallelism-for-robustness.

This might be a bit hard to design for, since it relies on the building of simple actors that will have an emergent system effect, while only sharing or using local information. It's not always easy to ascertain what the global effect of many interacting simple actors will be analytically, since it might not always be tractable. In the reverse, given a desired emergent global system effect, to find the simple actor that will do that isn't a walk in the park. However, I think once achieved, it will have the robustness that will make it more adaptable than current systems.

If anyone out there knows of such things, post a comment and let me know.
--
Update: I found that there was just a debate on Software Transactional Memory just now, and a nice post on how threading sucks. I know nothing compared to these people.

No comments:

Post a Comment