Just a Flag in the Database – Why Networks Fail (to Perform)

Part 3 of a series adapted from Joel Trammell’s Keynote Speech at NetQoS Symposium 2008
My favorite story for a changed application: A network team was fighting with a thorny application performance issue for a long time and they saw the application had changed its bandwidth usage by an order of magnitude overnight. They went back to the application developers and asked, “Guys, why didn’t you give us a heads up that you had changed this application so dramatically?” An application developer said, “What do you mean changed dramatically? We just flipped a flag in the database.”
“What do you mean you flipped a flag in the database?”
“Well, we made the graphics field, you know, available to the user.” So, it went from a text based application to one showing JPEG images that could be on the order of a MB in size, one for every page on that application.
To the application developer, this was not a change of any significance in the application.
But to the networking team, this was a major issue. Often, the application development team is not going to know what’s going to be a little change to them and a big change to the network engineer. That’s why it’s important to quantify the normal behavior and performance for your key existing applications so that you can detect this change.
Do you have some sort of alarm in place of variations from what that normal performance is? Can you detect an unusual traffic flow that might indicate a change in usage or a change in architecture? Perhaps the server team has repositioned servers, it’s going to have a dramatic effect on your network, though they haven’t “changed” the network at all, right? Can you reconstruct the transaction timing to understand why things have changed?
So, the key driver in this case to detecting a changed application is going to be response time. You know what a normal baseline response time is. Hopefully, you’re going to see an alarm off that normal baseline. Then you’re going to want to be able to understand if traffic flows have changed, so anomaly detection may help you in this case and of course packet analysis again may help you understand how the application has specifically changed.

No comments yet.

Leave a Reply