Distributed computation has been around a while in different forms – Beowulf clusters, for example, – but Ian Clarke, the developer of Freenet and founder of Revver, has started working on a programming language, based on Scala, called “Swarm,” which he hopes will create a distributed programming language that can run on almost any operating system.
Because it runs on an application level, any computer can be a part of Swarm. You run Swarm on any computer you like, and you can access the computation of other computers running Swarm on the network; or, theoretically, on the public Internet. And Swarm allows a programmer to code an application for multiple CPUs and multiple computers with the same code that you could code for one CPU on one computer.
Now, there are projects such as SETI@Home or Folding@Home which do similar grid-computing tasks, but both are based on a model of breaking up the data to bite-sized chunks, moving that data to individual machines, where the information is processed, and then resending the output back to the central server.
Swarm is trying to flip that on its head. With Swarm, you can run the program wherever the data resides. So if you had a piece of data on Computer A, and a piece of data on Computer B, and you wanted to do a calculation that required both A and B’s data, you wouldn’t need to copy the data over the network – the program would execute on both A and B, returning the result of the calculations on B’s data to Computer A. Swarm is designed to manage which software runs with which data on which computer – without the programmer having to think about it beforehand.
Combine this with the latest advances in dynamic allocation of virtual servers according to need, and you start to really chip away at a whole bunch of scalability problems that have traditionally plagued massively-multi-user-applications… that is, Web apps.
Now, here’s the question: CPU latency is measured in picoseconds. Network latency is measured in milliseconds. The question is: How do you figure out what computations will actually benefit from being offloaded to another computer? – i.e., which computations are so far back in the stack that it would be better for them to go for a round trip across the Ether than to just wait patiently for the stack to clear? It seems to me that network latency monitoring would be very important for such an application.
For example, let’s use some of the NetQoS Network Estimation Tools (shameless plug) to determine how fast we can theoretically get a calculation going over the network. So, figuring a router latency of 0.5 milliseconds on both ends, a server latency of 2ms, a link speed of 64000, and a (very short) link distance of 10 miles – you’re looking at 132 ms of latency altogether – assuming point-to-point protocol.
In that 132ms, a 2.4 GHz quad-core computer can perform 1.26 billion calculations locally. That seems like a lot – and it is. But you actually start saving time once you hit 1.26 billion plus one calculations. For some applications, that might be worth it.
But other than pure speed, there’s another reason to consider running Swarm – and that is that applications coded with Swarm should have the ability to continue running on other servers – preserving the application in the case of fault or insufficient resources on the primary computer.
Right now, Swarm is more theory than fact, and there’s a lot of work to be done before it can be practical. But anything that requires less data to be sent over the network is something to keep an eye one when trying to preserve network performance.



No comments yet.