strace — The Sysadmin's Microscope
Every senior Linux admin has a story where strace saved the day. When an application silently fails, hangs, or crashes, strace reveals exactly what’s happening between the process and the kernel.
strace intercepts and records system calls — the interface between user-space programs and the Linux kernel. Every file open, network connection, memory allocation, and process fork goes through a system call. strace shows you each one.
Trace a command from start to finish:
strace ls -l /tmpAttach to a running process by PID:
strace -p 2345When a process spawns children, -f follows them:
strace -f -p $(pgrep nginx)Instead of seeing every call, get a summary of counts and times:
strace -c -p 1234Hit Ctrl+C after a few seconds. You’ll see which syscalls are called most and where time is spent.
Focus on specific syscall families:
# File operations only
strace -e trace=file -p 2345
# Network-related syscalls
strace -e trace=network -p 2345
# Process management
strace -e trace=process -p 2345# Relative timestamps between syscalls
strace -r -p 2345
# Absolute wall-clock time
strace -t -p 2345
# Microsecond precision
strace -tt -p 2345A web application returns 500 errors. The logs say “permission denied” but don’t say which file.
strace -e trace=open,openat,stat -p $(pgrep -f uwsgi) 2>&1 | grep -i "denied\|eacces"This reveals exactly which file the process cannot access. Nine times out of ten it’s a log file or a Unix socket.
A database connection pool is exhausted and the app hangs. Find out what every worker is waiting on:
for pid in $(pgrep -f gunicorn); do
echo "=== PID $pid ==="
strace -e trace=network -p $pid 2>&1 | head -20
doneYou’ll see workers stuck on connect() or poll() to an unresponsive database.
A service takes 30 seconds to start. Find where the time goes:
strace -T -f -o /tmp/startup.log /etc/init.d/myservice startThe -T flag shows the time spent in each syscall. Sort by duration:
awk '{print $4, $NF}' /tmp/startup.log | sort -rn | head -10You updated a config file but the application ignores it. Verify which files the process actually opens:
strace -e openat -p $PID 2>&1 | grep "\.conf\|\.yml\|\.json"strace adds significant overhead. On a busy production server, tracing every syscall can slow a process by 10-100x. Use these techniques to minimize impact:
-e trace= to filter only relevant syscalls-c (summary mode) instead of per-call tracingstrace -p $PID sleep 5 to auto-stop after 5 secondsstrace is not a tool you leave running. It is a surgical instrument — apply it briefly, gather data, and move on.
Related Articles
strace — The Sysadmin's Microscope
Learn how senior Linux admins use strace to trace system calls, debug performance issues, and unravel mysterious application behavior.
DNS Demystified 4: Troubleshooting DNS Issues
A systematic approach to diagnosing DNS problems — from NXDOMAIN to SERVFAIL, slow resolution, and misconfigured zones.
CCNA Lab 9: Load Troubleshooting and Switch Performance
Diagnose high CPU, memory exhaustion, TCAM pressure, and interface errors on Cisco switches — keep your network running under load.