The reports of the death of the network engineer are greatly exaggerated.

brianboyko.jpgBy Brian Boyko
Allan Leinwand at GigaOM has written a story entitled “Web 2.0 & Death of the Network Engineer” about meeting with the CTO of an unnamed Web 2.0 company. There, the CTO said: “The Internet is like electricity. We plug into it and all of the things that you mention are already there for us. We don’t spend any time at all on network or server infrastructure plans.”
Keeping in current Web 2.0 naming convention, I’m guessing the Web 2.0 service will be called “Xcessive Netwerk Retranzmizzns.”
Okay, maybe that’s a little harsh, but this attitude baffles me. If your business depends on services provided on the Web, you’d better be able to have a network that can handle the amounts of data requests that are coming in. Sure, you could outsource your data center and networking needs to a third-party service provider, but even then you need to keep apprised of what that service provider can handle – not just what they tell you they can handle.
Service providers often have SLA agreements that sound good on paper, but without independent verification, you can end up being misinformed about your network’s capabilities.
One major company’s service level agreement stated that managed Internet service latency – round trip transit delay – will be no more than 39 milliseconds. That sounds good, but the method they used to calculate that 39 millisecond latency was, to put it mildly, flawed. They measured the latency between city cores over the Internet backbones, not factoring in last-mile transmission. Additionally, they measured latency as the monthly average of transmissions of test packets – which for all we know could be small, prioritized across the backbone, or both – across these city core pairs. Because the latency was calculated as an average, and not the maximum, a particular network link could have horrible performance over a long period of time, but still average out to be under the SLA the company promised.
But if you didn’t walk through the process of calculating how bad performance could get before those average numbers were bad enough to violate SLA for the C-level executives, then all they’re likely to remember is the idea that “Company X has a 39 millisecond SLA.” You need to “trust, but verify.” And you’re not going to be able to do that without planning for what you should have.
To extend the unnamed CIO’s Internet-as-utility metaphor further, electricity is not 100% reliable either. Do those companies that require 100% uptime for electricity – hospitals, for example – trust that the third-party electric company can meet their needs? No – they have UPS systems and generators.
(Continued…)


More than that, before they install the generators, they get a good estimate from an electrical engineer as to how much power they’d need, and for how long they need it, in order to prepare for generating their own electricity in the case of an outage. They need plans for making sure that the switchover from electric-grid power to generator power occurs instantaneously, and that truly critical appliances are granted a higher priority – things like iron lungs and surgery room overhead lights will probably get a higher priority than televisions and the gift shop.
Electricity is, in many ways, much simpler than networking because there aren’t eight different types of electricity that travel through your wires. Running a network is like having electricity and gas and sewage and drinking water all run through the same pipe. (Electricity and water don’t mix, and water and sewage do, but you really don’t want them to).
What a good network engineer does is manage for best performance – you’ve got multiple types of traffic that behave differently. Some of them are incompatible so you’ve got to keep them separate somehow (like TCP, which scales down with congestion and UDP which scales up with available bandwidth,) and you have to figure out how much of the pipe to allocate to each application, how to best anticipate the changes that are coming, and make sure the network keeps running when a change occurs that you do not anticipate, otherwise, you get slow servers and lost business. Brownouts can be as annoying and damaging to productivity as blackouts if what you’re doing is mission critical.
We’re used to the Internet being an always-on service in our homes, and we’ve come to expect near 100% uptime from Local Area Networks. But networks do not scale as well. Metcalf’s law states that the utility of the network corresponds to the square of its nodes. So does the network’s complexity. Every device, switch, router and server adds an additional layer of complexity to the network.
Additionally, Wide Area Network technology is not nearly as mature as LAN technology is. Many people think that, if it’s easy to run a local network, it must be easy to run a distributed network. But applications that run well on the LAN often perform poorly over the WAN. There is a non-linear increase in complexity as you go to multiple sites and multiple data centers.
As a network engineer, you do these things – plan for capacity, come up with the most efficient way to serve your core customers, make sure that the hardware is up to the task, manage the complexity between multiple sites and data centers, and keep your applications flowing on the same pipe – all to keep your mission critical applications running with the best performance possible so that the business can continue to thrive.
Now, from a consumer standpoint, maybe it makes sense to think of the Internet as a utility. When I’m at home, I don’t have a backup electricity generator. Then again neither my life nor my livelihood depend on it.
My livelihood, in particular, does depend on the Internet, and I trust that my Internet connection will be working most of the time when I need it. Sometimes, like last Friday, for example, it isn’t. (Time Warner sent a tech out Friday afternoon.) Still, I’ve got backup Internet access in my office and in Austin’s many Internet cafes. But when I was working as a Web designer back in the late 1990s, I ended up losing a job because my Internet access was down for an extended period and the service provider took weeks to get it set back up. The service provider, in that case, was a college – I ended up “firing” that college over this issue (and others) by transferring to another one.
So if even from a consumer-level standpoint it’s important to plan out your network, prepare for emergencies, and verify that you’re getting the level of service you need. I can’t imagine that the unnamed CTO of the unnamed company will retain the attitude – or the position – for long.
Brian Boyko is editor of Network Performance Daily.

No comments yet.

Leave a Reply