CCNA Lab 9: Load Troubleshooting and Switch Performance

A switch that is still forwarding packets but struggling under load produces intermittent failures that are hard to diagnose. Every L2 engineer needs to know how to read CPU utilization, memory pressure, TCAM exhaustion, and interface error counters.

CPU Utilization

The CPU on a switch handles control plane traffic (STP, routing protocols, management) and software-forwarded data packets. High CPU does not always mean a loop — process starvation, ACL logging, and SNMP polling can all cause it.

Checking CPU

show processes cpu
show processes cpu sorted
show processes cpu history

The history output shows a bar chart across time:

      100
       95
       90
       85
       80
   %   75
   C   70
   P   65
   U   60
       55
       50
       45
       40
       35
       30
       25
       20
       15
       10
        5
                ....#....#....#....#....#....#....#....#....#....
                  0    5    0    5    0    5    0    5    0
                       50   55   00   05   10   15   20   25

Spikes above 80% that sustain across multiple intervals indicate a problem.

Identifying the Hog Process

show processes cpu sorted | ex 0.00

Look for processes consuming non-zero CPU:

PID    Runtime(ms)    Invoked     uSecs    5Sec    1Min    5Min    Process
269    1234567        98765       12493     12.23%  8.45%   5.67%   ARP Input
112    456789         12345       36987     8.45%   6.12%   3.89%   IP Input

High values in ARP Input often point to an ARP storm or a host with a /32 mask doing unresolved ARP. IP Input with high CPU suggests routed traffic being software-switched (no CEF).

Common CPU Hogs

Process	Likely Cause
ARP Input	ARP flooding, subnet scan, STP TCN storm
IP Input	CEF disabled, punted traffic
Spanning Tree	Topology changes, TCN flood
Cat4k Mgmt	High SNMP polling, too many OIDs
ACL Logging	ACL with `log` keyword hit by every packet
DHCP Snooping	High DHCP request rate

Memory Management

Check Memory

show memory
show memory statistics
show processes memory
show processes memory sorted

Memory Leak Detection

show processes memory | include Process Name
show processes memory sorted | head 10

Compare memory usage across reboots. A process that grows continuously without release is leaking.

Low Memory Symptoms

Configuration changes fail with % Not enough space
SSH sessions drop
SNMP polling fails intermittently
show commands return partial output
Syslog: %SYS-2-MALLOCFAIL

Emergency recovery: Reload the switch during a maintenance window. There is no graceful memory reclamation on most Catalyst switches.

TCAM Exhaustion

TCAM (Ternary Content Addressable Memory) stores ACL entries, QoS policies, and forwarding entries at wire speed. When TCAM fills, new entries are rejected or forwarded in software.

Check TCAM Utilization

show platform tcam utilization
show platform tcam counts
show sdm prefer

Example output:

CAM Utilization for ASIC 0
                         Max     Used
                     Masks/Values Masks/Values
 Unicast mac addresses: 6384/6384  512/512
 IPv4 IGMP groups    + 1024/1024   8/8
 IPv4 unicast routes:  2816/2816   12/12
 IPv4 direct adjacencies before load share: 2816/2816   12/12

If any category exceeds 80%, plan for expansion or TCAM optimization.

SDM Templates

Switches use SDM (Switch Database Management) templates to allocate TCAM:

show sdm prefer
 
! Change template (requires reload)
sdm prefer vlan      ! Maximizes MAC addresses
sdm prefer routing   ! Maximizes IPv4 routes
sdm prefer acl       ! Maximizes ACEs

Changing the template requires a reload. Plan accordingly.

Interface Error Counters

High CRC, runts, giants, or collisions point to physical layer issues.

show interfaces Gi0/1
show interfaces Gi0/1 counters errors
show interface statistics

Key Error Types

Error	Meaning
CRC	FCS error — bad cable, SFP, or duplex mismatch
Runts	Frame < 64 bytes — collisions or bad NIC
Giants	Frame > 1518 bytes — misconfigured MTU
Input errors	Sum of all receive-side errors
Output errors	Sum of all transmit-side errors
Collisions	Late collisions = duplex mismatch
Overruns	Switch can not keep up with ingress rate
Underruns	Switch can not feed the egress line rate

Error Threshold

Zero is the only acceptable CRC count. Any non-zero CRC points to a physical issue.

! Clear counters for fresh measurement
clear counters GigabitEthernet0/1

Duplex Mismatch Detection

show interfaces Gi0/1 | include duplex

The most reliable sign of a duplex mismatch is late collisions on the half-duplex side and CRC errors on the full-duplex side. Modern switches should always be set to duplex full.

Switch Backplane Saturation

Indicates the switch fabric cannot handle the aggregate traffic.

Monitoring Backplane

show platform port-asic statistics
show platform backplane rate
show controllers ethernet-controller

High overrun/submit errors on multiple ports simultaneously suggest the backplane is saturated.

Real-World Scenarios

Scenario 1: “Slow SSH and intermittent SNMP timeouts”

show processes cpu | include CPU
# CPU: 5 sec = 92%, 1 min = 88%, 5 min = 75%
 
show processes cpu sorted | exclude 0.00
# ARP Input at 25%, IP Input at 18%, ACL-Log at 12%
 
show interfaces | include broadcast
# Gi0/24: 12000 broadcast packets/sec
 
show running-config | include log
# access-list 100 permit tcp any any log

ACL logging with log keyword punts every matched packet to the CPU. Remove the log keyword from ACL entries in the forwarding path.

Scenario 2: “Users report random drops on one switch”

show interfaces Gi0/1 counters errors
# CRC: 4523, Runts: 234, Late Collisions: 89
 
show interfaces Gi0/1 | include duplex
# Half-duplex
 
show interface Gi0/1 | include speed
# 10 Mbps

The port negotiated 10/half instead of 100/full. Likely a bad cable or faulty NIC. Hard-set the interface:

interface Gi0/1
 speed 100
 duplex full

Scenario 3: “Show commands take forever”

show processes cpu | include CPU
# CPU: 5 sec = 15%, 1 min = 12%, 5 min = 10%
 
show memory statistics
# Free memory: 45MB out of 256MB (low)

Memory pressure causes the switch to swap processes to slow memory. Schedule a reload.

TCAM Troubleshooting

! Check if an ACL entry was installed in TCAM
show platform tcam interface Gi0/1 acl
 
! Check if QoS policy matches in hardware
show mls qos interface Gi0/1 statistics

If TCAM is full, the switch either drops new ACL entries silently or punts them to software. Always check TCAM after adding large ACLs or QoS policies.

Proactive Monitoring Commands

Run these weekly on every switch:

show processes cpu sorted
show memory statistics
show platform tcam utilization
show interfaces counters errors | include CRC|error
show interfaces | include line protocol|rate
show environment
show logging | include down|err|flap|MALLOCFAIL
show controllers ethernet-controller | include overrun

Best Practices

Enable CEF — ip cef ensures hardware forwarding. Disabled CEF forces all traffic through CPU.
Limit SNMP polling — Poll no more than once per 5 minutes on production switches.
Remove ACL logging — Never use the log keyword on ACLs in the forwarding path.
Hard-set speed/duplex — Do not rely on auto-negotiation on critical links.
Monitor TCAM — Check utilization before ACL or QoS changes.
Schedule maintenance reloads — Memory fragmentation grows over time.
Use SDM template matching your role — Access switches need MAC addresses; distribution switches need routes.

Quick Commands

Command	When to Use
`show processes cpu sorted`	High CPU — find the hog process
`show processes memory sorted`	Memory leak — find the leaking process
`show platform tcam utilization`	TCAM exhaustion — before adding ACLs
`show interfaces counters errors`	Physical layer issues
`clear counters`	After replacing cable/SFP, verify fix
`show environment`	Temperature and power health
`show controllers ethernet-controller`	Backplane / ASIC errors
`show sdm prefer`	Current TCAM allocation template
`terminal monitor`	View logs in SSH session
`show logging`	Recent events and errors