Probing Your Network: Gathering Data on Network Performance and Device Status

According to the old adage, you can't manage what you can't see. That applies perfectly to networks and the countless devices and connections that define and determine their performance level. To keep an eye on the health of your network you will need a network monitoring system that constantly collects statistics and other diagnostic information and alerts you to problems. Ideally, your network monitoring system will show your network management team potential problems – traffic slowdowns, packet losses, limited memory or hard drive space, heavy load on circuits and servers – that can be addressed before affecting end-user productivity. Proactive network management depends upon collecting data that is as specific to a device as possible. Network management systems that employ a variety of information gathering – or probing – techniques, offer the greatest management value.

“The AudioVisual people can't tell what's causing the problem and they typically blame the network. If they didn't have access to InterMapper and if the probe wasn't delivering the data it does, I would have spent the rest of my life figuring out problems for those guys.”

Dennis O'Reilly
University of British Columbia

Methods of Testing a Network

Technology and industry standards offer a series of approaches to testing the status and performance of network devices without radically increasing network traffic. There are also several ways to gather real-time statistics and data that is critical for fast problem diagnosis and resolution.

Ping

Ping is the most commonly used method of network performance measurement. Ping request packets are sent from a management station to a remote device. If the pings arrive at the remote device, it sends ping response packets back to the originator. The originator can then calculate the percentage of packets that are lost and record the response time.

Why is this a good test? The Internet is built on best-effort delivery. Congestion, electrical noise, or other faults in the network may delay, corrupt, or drop one or more of the packets. Because ping packets travel in-band, they will experience the same delays and loss rates as normal data sent through the network. If a packet isn't received properly, the sender will re-transmit, and it's likely that the second copy will arrive as expected. While pinging is a very simple method for monitoring network performance, its simplicity gives it great power in detecting and diagnosing several problems. Ping test can help to detect:

  1. Packet loss – Packet loss continues to be a major culprit when networks slow down. A lost or dropped packet requires the sender to retransmit – usually after a pause. If these retransmissions are infrequent, transfer speeds are not affected. But when packet loss is frequent, then the pauses become exponentially longer, which results in dramatic network slowdown. As a rule of thumb, packet loss should be well below 0.1% on LAN links and less than 1.5-2.0% on WAN links.

  2. Packet delay (round trip time) – Packet delay doesn't generally contribute to slow networks because modern network protocols can compensate for delays in a link. However, if the measured round trip times change during the course of the day, for example, starting low during quiet times (typically late evening or early morning hours) and then peaking during busy periods, it could be an indication that links are overloaded and due for an upgrade.

  3. Jitter – Jitter is the variability in the round trip time for a set of packets. Jitter measurement is critical to ensuring the quality of Voice over IP (VoIP). Cisco routers can perform IP Service Level Agreement (IP SLA) measurements by sending a stream of like-sized ping packets in order to measure variability, or jitter, in their return times. (IP SLA also measures round trip time and packet loss.) If jitter measures are high, if there is a lot of variability, VoIP quality will be low.

Ping measurement does a good job of monitoring the flow of traffic across the network but proactive network management typically requires more quantitative data and that's where SNMP comes in. Virtually all commercial-grade routers and switches implement SNMP - Simple Network Management Protocol – to return operational statistics and configuration information to a network management application.

SNMP

At the least, SNMP enabled devices typically provide traffic measurements (packets and bytes per second) and error statistics for each port. Network management applications compare these statistics to utilization thresholds, and tell the network manager when links are heavily loaded (or overloaded), and which links are experiencing errors (packet loss) that leads to poor performance. These errors would also be detected by the ping testing described above, but SNMP allows the network manager to identify which port is causing the problem.

Network management applications can gather more than traffic and error data with SNMP. Hardware manufacturers provide MIB files that describe the statistics that their specific device can report as well as the ObjectID (OIDs) that a network management application can use to retrieve those data values. Here are examples of various kinds of statistics that are available:

  • Host resources – CPU, memory, and disk utilization as well as information about processes running on the system

  • UPS – Battery temperature, whether the UPS is running in AC power, time remaining while running on battery

  • Wireless – Signal strength, noise levels, number and identity of associated clients, switch contacts, etc.

  • Environmental – Temperature and humidity as recorded by sensors, switch contacts, etc.

Comprehensive network monitoring applications can read all these SNMP values and send alerts when the values exceed customer-settable thresholds.

A note about SNMP versions

Three versions of SNMP have been released since its first implementation. All versions offer ways to retrieve the same data from a device but security and power have improved between the versions. SNMPv1, released in 1990, was the first standardized protocol to enable network monitors to retrieve data from a network device. In 1995, SNMPv2c was standardized to offer additional data types and more powerful commands. Both those versions are still in wide use. Their downside is that they both send data in plain text. SNMPv3, released in 2002, made data encryption and authentication available for all SNMP traffic, dramatically improving security.

Application and Server Monitoring

Although pinging and SNMP queries provide powerful tools, not all aspects of a server's operation can be monitored by these techniques. There is no substitute for an end-to-end test that simulates user requests to ensure that things are actually working as expected.

A network management system can create synthetic transactions that mimic true end-user experience by sending the same kinds of requests or queries and monitoring the responses and the round-trip time from the device. Synthetic transactions also allow network administrators to stress network connections artificially to determine how additional load affects server performance and where extra bandwidth is needed.

As an example, consider how to test a web server. At its core, it's pretty straightforward: the network management application generates a synthetic transaction, say, an HTTP Get request, and verifies that a response comes back. However, a lot of additional information is available from such a simple test:

  • The lack of a response, of course, indicates that the server isn't operating

  • The round-trip time (from sending the request to receiving the full response) is indicative of both server load and network conditions.

  • The result code (page not found, redirected, forbidden) can indicate a problem.

  • The content of the returned page can be checked against what was expected.

It's also possible to use synthetic transactions to verify that a web server's redirection is working as expected, for example, after a web site redesign, or that proxy servers are functioning properly. Any of these conditions provide information that could trigger alerts to network management.

In a similar way, synthetic transactions can keep tabs on standards-based servers including Mail servers (SMTP, POP3, IMAP4), DNS servers, LDAP, LPR, RADIUS, and FTP, as well as proprietary servers including Lotus Notes, Citrix, spam firewalls, environmental monitors, UPSs, wireless equipment and others.

Custom Probes

A network management application should also provide a means of creating additional kinds of tests. This could include the ability to query various SNMP variables and to build synthetic transactions that mimic the protocol of the other servers.

It is also important that the network management application have a vibrant community of contributors, because you may just to find that someone else has already solved the same problem.

InterMapper Network Monitoring Software

InterMapper uses all these techniques to gather information on network device status and performance through built-in probes that are developed and tested for specific devices from specific vendors. These built-in probes apply the appropriate data gathering and testing techniques enabling network managers to focus on the data returned by the probe, not the creation of the probe itself.

InterMapper also facilitates the creation of custom probes that take advantage of data gathering techniques. Custom probes are useful for legacy or proprietary devices.

Using a combination of built-in probes and custom probes, InterMapper returns a wide variety of performance data that allows network managers to diagnose and resolve problems quickly.

An example: The network manager receives an alert (a page, an e-mail, an audible notification, etc.) that a web server's response time is getting high. The following probes are available for troubleshooting:

  • Ping probes will show if there's packet loss in the network. As noted earlier, packet loss can significantly affect transfer rates and customers' perception of network speed. Because InterMapper is continually monitoring packet loss throughout the network, it can help locate trouble spots.

  • SNMP probes on the routers and switches can show heavy traffic or high error counts. These may be normal peak-time events, or an indication of a problem.

  • CPU, disk, and memory probes (using SNMP to retrieve data from the routers or the server) will show whether there's a CPU or other resource limitation.

  • The response time measurements from the HTTP probe gave the notification in the first place, and lets the manager monitor the situation to see if it's improving.

InterMapper makes all these various measurements visible in a straightforward network map. Information is provided via geographic maps and diagrams that quickly indicate performance patterns and outage symptoms. Easily accessible charts and reports also present collected statistics and device data. Network managers have all the details required for problem diagnosis and resolution.

Dartware, LLC develops the InterMapper® family of network monitoring software. InterMapper earns quick return on investment by proactively alerting administrators to potential slow-downs, crashes, other business interruptions. Its real-time, color-coded maps and other data displays provide users with an instant view of their network and device status. Dartware's software is installed in financial services, healthcare, retail, education, government and non-profit, WISP, and ISP organizations around the world.