javajvmperformance

JVM Garbage Collection: Understanding Memory Without Black Magic

The JVM managing memory automatically sounded too good to be true. In 1997 it mostly was. Here is what was actually happening and how we dealt with it.

·4 min read

JVM Garbage Collection: Understanding Memory Without Black Magic

Garbage collection was one of Java's most controversial features in 1997. C and C++ developers were sceptical — how could automatic memory management be reliable, let alone fast? The answer, initially, was that it was not particularly fast. The JDK 1.1 garbage collector paused the entire JVM to collect. For a network management system with real-time alerting requirements, "stop the world" was not a theoretical problem.

Here is what we actually understood about the collector and what we did about it.

How the JDK 1.1 Collector Worked

The JDK 1.1 used a mark-and-sweep collector operating on a single heap.

Mark phase: Starting from GC roots (static fields, local variables on thread stacks), the collector followed every object reference and marked each reachable object.

Sweep phase: The collector scanned the entire heap. Unmarked objects were dead. Their memory was reclaimed.

Both phases required the application to stop completely — the "stop-the-world" pause. All threads halted. No application code ran. Pause duration was proportional to heap size.

Heap at collection time:
┌─────────────────────────────────────────────────────┐
│ [LIVE][dead][LIVE][LIVE][dead][dead][LIVE][dead]     │
└─────────────────────────────────────────────────────┘

After sweep:
┌─────────────────────────────────────────────────────┐
│ [LIVE][    ][LIVE][LIVE][         ][LIVE][    ]      │
└─────────────────────────────────────────────────────┘
                             ↑ fragmentation

The fragmented heap was a secondary problem. Object allocation requires a contiguous block of memory. A fragmented heap has free space but not in usable chunks. JDK 1.1 handled this with compaction — moving live objects together — which extended the pause further.

Measuring GC Behaviour

Before tuning anything, measure. We added verbose GC output:

java -verbose:gc -mx64m -ms32m com.motorola.nms.Main

This printed a line per collection:

[GC 4928K->1024K(16384K), 0.0652 secs]
[GC 5952K->1536K(16384K), 0.0731 secs]
[Full GC 8192K->2048K(16384K), 0.4821 secs]

The format is [type heapBefore->heapAfter(totalHeap), pauseSeconds]. Full GC pauses were consistently 400–500ms. On a system processing SNMP traps, losing 500ms every few minutes caused alert delays.

Object Allocation Patterns

The GC runs when the heap fills. Reducing allocation frequency reduced GC frequency. We audited the hot paths:

// Bad: allocates a new String every poll cycle for 500 devices
public String formatStatus(String ip, DeviceStatus status) {
    return "Device " + ip + " status: " + status.toString();
}

String concatenation with + created intermediate String and StringBuffer objects. In a tight loop this generated significant garbage. The fix was to reuse objects:

// Better: reuse a StringBuffer
private final StringBuffer buffer = new StringBuffer(128);

public String formatStatus(String ip, DeviceStatus status) {
    buffer.setLength(0);
    buffer.append("Device ").append(ip).append(" status: ").append(status);
    return buffer.toString();
}

Even better for logging: only format when you are going to use the result.

Object Pooling

For objects that were expensive to create and short-lived, we used pooling:

public class SnmpPduPool {
    private final Stack<SnmpPdu> available = new Stack<>();
    private final int            capacity  = 100;

    public synchronized SnmpPdu acquire() {
        if (available.isEmpty()) {
            return new SnmpPdu();
        }
        return available.pop();
    }

    public synchronized void release(SnmpPdu pdu) {
        if (available.size() < capacity) {
            pdu.reset();
            available.push(pdu);
        }
        // if pool is full, let the object be GC'd
    }
}

Pooling trades GC pressure for synchronisation overhead. Profile before pooling — the overhead is not always worth it.

Heap Sizing

The heap size flags were among the most impactful tuning knobs:

-ms<size>   # initial heap size (Xms in modern JVMs)
-mx<size>   # maximum heap size (Xmx in modern JVMs)

A heap that was too small collected frequently. A heap that was too large had less frequent but longer pauses. We found a good balance by running with verbose GC and targeting collection frequency under once per minute for our workload.

java -verbose:gc -ms64m -mx128m com.motorola.nms.Main

Finalisers

Never use finalize() for anything time-sensitive. The JVM called finalisers on a single thread, after the GC had determined an object was unreachable. The timing was unpredictable and the finaliser queue backed up under load. We had one class that used finalize() to close a socket. Under high load, sockets accumulated in the finaliser queue until the JVM ran out of file descriptors.

The fix: explicit close() calls in a finally block.

What Changed Later

Java 1.2 introduced a generational collector based on the observation that most objects die young. The heap was split into young and old generations. Young generation collections were fast (tens of milliseconds); old generation collections were the expensive ones. This model — used in every JVM since — made GC tuning more tractable and production Java much more viable.

The fundamentals from 1997 still matter. Short-lived objects are good. Long-lived objects are expensive. Measure before tuning. Avoid finalisers. The names of the flags changed. The principles did not.