Crash testing

Networks grow steadily more complex. Voice over Internet Protocol (VoIP) is on the rise, with video on its heels. Virtual LANs are more common, 100-Megabit-per-second speeds are giving way to gigabit, storage-area networks are proliferating, security is a greater concern, and network reliability

matters more and more.

So when something goes wrong, the network administrator needs to know why – and fast. Finger-pointing – the network people say the problem is with the server, the server people say it’s the application, the application guys blame the network and all the equipment vendors say it’s the other guy’s product, not theirs – just won’t do. But locating the root cause of a problem is not always simple.

“”Complexity is increasing,”” says Ronald Gruia, enterprise communications program leader at research firm Frost & Sullivan Canada Inc. in Toronto. “”Of course, the sophistication of tools is increasing as a result of this added complexity, but … you have to learn a lot more and you have to do a lot more than you have in the past.””

“”Computers aren’t designed to work and play well together,”” adds Brent Hanbury, national brand manager for Tivoli Systems at IBM Canada Ltd. in Markham, Ont., “”and it almost fosters all the finger-pointing.””

Tivoli software maintains database

Gruia suggests that a network manager requires at least a couple of different types of troubleshooting tools. One is a pre-testing tool, able to take a snapshot of a network at a specific time and provide information on such things as latency and packet loss. An example is Everett, Wash.-based Fluke Networks Inc.’s CableIQ. Brad Masterson, Fluke’s product manager in Canada, says the device can determine if a cable is ready to handle, say, VoIP or Gigabit Ethernet.

The network-troubleshooting arsenal should also include monitoring and performance management. Tools such as Hewlett-Packard Co.’s OpenView, or NetView from the Tivoli unit of IBM, provide an over-all picture of the network, sound the alarm when there is a malfunction and help the network administrator drill down to the root cause of the problem.

Tivoli’s Enterprise Console suite, which includes NetView, helps network administrators manage, monitor and test the network, Hanbury says. Alarms can be set to warn of incipient problems before they become serious, and “”should a problem erupt, it also is able to drive deep, right down to the router level or whatever, to avoid finger-pointing.””

Hugo Garcia, senior IT specialist in the Tivoli Group, says the software maintains a relational database of everything it processes, and tracks down the cause of transient problems that occur when network technicians are not around to observe them, such as in the middle of the night.

NetMRI software from Netcordia, Inc. of Annapolis, Md., builds a picture of the network when it is first installed, then monitors it and produces regular reports giving the network scores for stability and correctness. The stability rating measures actual availability, explains Paul Markun, Netcordia’s vice-president of marketing, while the correctness rating evaluates with proper configuration. Configuration errors may not be noticeable, especially if a network is not heavily loaded, but can create problems later, particularly when more sensitive applications such as VoIP are installed.

Markun says the management tools that network equipment manufacturers such as Cisco provide for their gear are fine, but tracking the over-all health and correct configuration of the network is not their main priority. He maintains products like Netcordia’s can help eliminate finger-pointing between vendors. In fact, Markun claims an executive of Nortel Networks Corp. told him he was glad to see NetMRI at client sites where Nortel provides VoIP technology and Cisco provides network infrastructure, because it helps Nortel show customers when a problem originates with the network rather than the VoIP gear.

While many people associate Fluke with physical-layer troubleshooting tools such as cable testers, the company offers a range of products including its OptiView and NetTools series. Masterson describes OptiView Console as a low-end monitoring system, which provides a continuous picture of network performance. When something goes wrong, Masterson says, “”it reduces the time to find the problem.””

Increasingly, network monitoring focuses on applications, not just on network devices. Tools that monitor applications are more likely to alert the network administrator to issues that will prompt calls from unhappy users. However, such tools should look beyond a poorly performing application and find the underlying cause.

Garcia says Tivoli recommends customers monitor the performance of business transactions from end to end, using tools that can decompose those transactions into their components. He says Tivoli acquired two companies in 2004 — Candle and Cyanea — to help it build a stronger application-monitoring offering.

Compuware vantage tests application performance

CompuWare Corp.’s Vantage places monitoring points at key places in the network to determine if users are getting the service levels they should. “”We focus on the application and the service level that your IT organization is delivering to the customer,”” explains Lloyd Bloom, Vantage product manager at Detroit-based CompuWare. Bloom says Vantage helps fight finger-pointing by determining what transactions or applications are affected by a problem, and providing analysis to help determine if the cause lies in the application, the client or the network. It also includes predictive analysis tools to help network administrators seek solutions to problems.

Besides being used on functioning networks, Bloom says, Vantage’s predictive analysis tools can help test the performance of new applications over the network before they are put into production.

Fluke’s SuperAgent is another application performance monitoring tool. By isolating the cause of a problem to the network, the server or the application, says Masterson, it can reduce finger-pointing. However, application monitoring works best in co-operation with other tools that can, for instance, zero in on the exact point in the network where the trouble begins.

Tools like San Jose, Calif.-based Network General Corp.’s Sniffer can look at the individual packets traveling over the network. Telus Corp. uses the Sniffer products, particularly the Distributed Sniffer version, which monitors a network at multiple points, to pinpoint problems with customers’ networks. Curtis Sperle, a Telus network analyst, says his usual troubleshooting procedure is to check the cabling first for physical issues, then turn to Distributed Sniffer to find the location of the problem. While the Sniffer may not always indicate exactly what the problem is, it will generally show where it is.

One shortcoming of such tools has been that they only examine current traffic on the network. If something went wrong at 3:00 a.m. when no network technician was present, but the problem has since disappeared, this may not help track down the cause. Network General recently responded with InfiniStream, an appliance that monitors packets as its Sniffer products do, but can store that data for days or even weeks in up to five terabytes of on-board storage. “”The customer base is saying we need more packets stored for a longer period of time,”” explains Tom Bienkowski, product line manager at Network General, and with storage becoming cheaper, responding to that need has become practical.

Bienkowski says a long-range view of network activity can sometimes bring useful perspective to solving a problem. Has the type of traffic through a router changed over the past month? Have applications been changed? Has usage increased? “”You see that through long-term reporting,”” he says.

Sometimes it takes detective work to solve problems

At the physical layer, cable testers deal with the actual cable. Agilent Corp.’s WireScope 350 is an example; aimed at cabling contractors, it can be used to certify cable installations and to locate problems. Palo Alto, Calif.-based Agilent also offers the FrameScope 350, which has an Ethernet interface and combines basic cable testing with analysis of 10- and 100-Mbps Ethernet networks. Charles Ganimian, business development manager at Agilent, says this tool is designed to answer questions like “”why is e-mail slow?””

Sometimes, Gruia says, pinpointing the cause of a network problem can be very easy. Other times, “”you need to do some detective work.”” One piece of advice he does offer is that time spent in properly configuring network-monitoring tools pays off in the long run. Alarms can be set up to warn of network problems before users would notice them, and to give specific information about what is going wrong. The more time and thought given to setting these up properly, says Gruia, “”the better off you are.””

Masterson says networks are becoming more reliable, but also more complex. “”They don’t break as often,”” he says, “”but when they do break, if you don’t have the right tools to help you get into the heart of these new technologies, you’re going to be in the dark.””

Share on LinkedIn Share with Google+
More Articles