networkingsnmpprotocolsinfrastructure

SNMP: How Network Management Actually Works

Every serious network has dozens of devices that need monitoring and configuration. SNMP is the protocol that makes this manageable — here is how it works under the hood.

·6 min read

SNMP: How Network Management Actually Works

At Motorola we manage a sizeable network of routers, switches and hubs across multiple sites. Without a standard way to query and configure these devices remotely, the operations team would spend most of their time driving between floors with a laptop. SNMP — the Simple Network Management Protocol — is the standard that makes centralised network management possible, and if you run infrastructure at any scale it is worth understanding properly.

What SNMP Is

SNMP is an application-layer protocol that runs over UDP. Version 2 (SNMPv2c) is the current working version in most deployments. The protocol defines how a management system (NMS) communicates with network devices to read their state, receive alerts and change their configuration.

The architecture has three components:

  • Managed devices — routers, switches, hubs, servers. Anything you want to monitor.
  • SNMP agents — software running on each managed device that implements the protocol, holds the device's management data and responds to queries.
  • NMS (Network Management Station) — the central system that queries agents, collects data and raises alerts. OpenView, NetView or a custom application.

The MIB: Management Information Base

Every SNMP-managed device exposes its data through a MIB — a hierarchical tree of named data items called OIDs (Object Identifiers). The MIB is essentially a schema. It defines what data is available and what each item means.

The tree structure looks like this:

iso(1)
 └─ org(3)
     └─ dod(6)
         └─ internet(1)
             └─ mgmt(2)
                 └─ mib-2(1)
                     ├─ system(1)
                     │   ├─ sysDescr(1)       -- device description string
                     │   ├─ sysUpTime(3)       -- ticks since last reboot
                     │   └─ sysName(5)         -- hostname
                     └─ interfaces(2)
                         ├─ ifNumber(1)        -- number of interfaces
                         └─ ifTable(2)         -- table of interface data
                             └─ ifEntry(1)
                                 ├─ ifIndex(1)
                                 ├─ ifDescr(2)
                                 ├─ ifSpeed(5) -- bits per second
                                 └─ ifInOctets(10)

The numeric OID for system uptime is 1.3.6.1.2.1.1.3. You can query it by number or by the human-readable name sysUpTime.

Vendors publish proprietary MIBs for their own equipment. Cisco has MIBs for IOS-specific features. Motorola publishes MIBs for our hardware. You load these into your NMS alongside the standard RFC 1213 MIB-II definitions.

The Four Operations

SNMPv2 defines four core operations:

GET — retrieve the current value of a specific OID.

GET 1.3.6.1.2.1.1.3.0   →   sysUpTime = 1234567 (hundredths of seconds)

GETNEXT — retrieve the next OID in the MIB tree. This is how you walk the tree without knowing every OID in advance.

GETBULK — retrieve multiple OIDs in one request. SNMPv2 adds this operation specifically to reduce round trips when reading tables. Polling an entire interface table with individual GETs is painfully slow on a device with 48 ports.

// Conceptual: GETBULK to retrieve first 10 entries from ifTable
// max-repetitions = 10, non-repeaters = 0
// Returns up to 10 OID/value pairs in a single response

SET — write a value to a writable OID. This is how you change configuration remotely — shut down an interface, update a hostname, change a community string. Most deployments are conservative about what they allow via SET because it gives write access to live equipment.

TRAP — unsolicited notification from agent to NMS. When a link goes down, when a device reboots, when a threshold is crossed, the agent sends a trap without waiting to be asked. You configure your NMS to listen for traps on UDP port 162.

Community Strings and Authentication

SNMPv1 and v2c use community strings as the authentication mechanism. A community string is essentially a shared password. Devices typically have:

  • A read community (commonly public) — required to send GET requests
  • A write community (commonly private) — required to send SET requests

This is the weakest part of SNMP v1/v2. Community strings travel in cleartext in UDP packets. Anyone on the network who can capture traffic can read your community strings and then query or configure your devices. For internal management networks this is acceptable — especially if the management traffic is on a dedicated VLAN that is not reachable from user segments. For anything crossing untrusted networks it is not.

SNMPv3 is under development and includes proper authentication with MD5 HMAC and encryption with DES. We are watching the RFC process and expect to migrate once implementations mature.

For now: use non-default community strings, restrict SNMP access to your NMS IP via access lists on each device, and keep management traffic on a separate network segment.

Polling Strategy at Scale

On a network of 500 devices, naively polling every interface metric every 60 seconds generates substantial traffic and NMS load. Some principles that work in practice:

Poll the right things. Not every OID matters all the time. sysUpTime and interface counters are essential. Detailed CPU and memory stats can poll every 5 minutes. Link state traps handle outage detection without polling for it.

GETBULK for tables. A switch with 48 ports has a 48-row ifTable. GETBULK retrieves the whole table in far fewer round trips than sequential GETNEXT.

Calculate rates on the NMS. Interface counters like ifInOctets are monotonically increasing 32-bit counters. To get bandwidth utilisation you compute the delta between two samples and divide by the interval. The counter wraps at 2^32 (about 4GB) — fast links can wrap in minutes. Account for wrap in your code:

public long calculateDelta(long previous, long current) {
    if (current >= previous) {
        return current - previous;
    } else {
        // counter wrapped
        return (4294967295L - previous) + current + 1;
    }
}

Stagger poll intervals. If your NMS sends 500 GET requests simultaneously every 60 seconds, you create a burst. Spread polls across the interval.

Putting It Together: A Simple Poller in Java

The dominant SNMP library for Java right now is SNMP4J in early form, or writing directly to the API. Here is a sketch of the core polling loop using a generic SNMP library interface:

public class DevicePoller {

    private SnmpSession session;
    private String targetIp;
    private String community;

    // OIDs we poll
    private static final String OID_SYS_UPTIME     = "1.3.6.1.2.1.1.3.0";
    private static final String OID_IF_IN_OCTETS   = "1.3.6.1.2.1.2.2.1.10";
    private static final String OID_IF_OUT_OCTETS  = "1.3.6.1.2.1.2.2.1.16";

    public DevicePoller(String targetIp, String community) throws Exception {
        this.targetIp  = targetIp;
        this.community = community;
        this.session   = new SnmpSession(targetIp, 161, community);
    }

    public DeviceMetrics poll() throws Exception {
        DeviceMetrics metrics = new DeviceMetrics();
        metrics.uptime      = session.getLong(OID_SYS_UPTIME);
        metrics.inOctets    = session.getLong(OID_IF_IN_OCTETS  + ".1");
        metrics.outOctets   = session.getLong(OID_IF_OUT_OCTETS + ".1");
        metrics.timestamp   = System.currentTimeMillis();
        return metrics;
    }
}

The real complexity is trap handling, MIB loading, table walks and correlation across devices — but the polling core is straightforward once you understand the protocol.

Why This Matters

Every network management tool in serious use today is built on SNMP. OpenView, NetView, Tivoli NetView — all of them poll SNMP agents and process traps. If you are building any infrastructure monitoring capability you need to understand what is happening underneath.

The protocol is not glamorous but it is solid. The MIB model has proven flexible enough to represent almost any device state. And understanding the underlying protocol means you can build custom monitoring for devices and metrics that your commercial NMS does not cover out of the box.