strace — The Sysadmin's Microscope

Every senior Linux admin has a story where strace saved the day. When an application silently fails, hangs, or crashes, strace reveals exactly what’s happening between the process and the kernel.

What is strace?

strace intercepts and records system calls — the interface between user-space programs and the Linux kernel. Every file open, network connection, memory allocation, and process fork goes through a system call. strace shows you each one.

Basic Usage

Trace a command from start to finish:

strace ls -l /tmp

Attach to a running process by PID:

strace -p 2345

Essential Flags

Follow forks (-f)

When a process spawns children, -f follows them:

strace -f -p $(pgrep nginx)

Summarize with -c

Instead of seeing every call, get a summary of counts and times:

strace -c -p 1234

Hit Ctrl+C after a few seconds. You’ll see which syscalls are called most and where time is spent.

Filter syscalls (-e)

Focus on specific syscall families:

# File operations only
strace -e trace=file -p 2345
 
# Network-related syscalls
strace -e trace=network -p 2345
 
# Process management
strace -e trace=process -p 2345

Timestamps

# Relative timestamps between syscalls
strace -r -p 2345
 
# Absolute wall-clock time
strace -t -p 2345
 
# Microsecond precision
strace -tt -p 2345

Real-World Troubleshooting Scenarios

Scenario 1: “Permission denied” with no helpful error

A web application returns 500 errors. The logs say “permission denied” but don’t say which file.

strace -e trace=open,openat,stat -p $(pgrep -f uwsgi) 2>&1 | grep -i "denied\|eacces"

This reveals exactly which file the process cannot access. Nine times out of ten it’s a log file or a Unix socket.

Scenario 2: Application hanging

A database connection pool is exhausted and the app hangs. Find out what every worker is waiting on:

for pid in $(pgrep -f gunicorn); do
  echo "=== PID $pid ==="
  strace -e trace=network -p $pid 2>&1 | head -20
done

You’ll see workers stuck on connect() or poll() to an unresponsive database.

Scenario 3: Slow startup

A service takes 30 seconds to start. Find where the time goes:

strace -T -f -o /tmp/startup.log /etc/init.d/myservice start

The -T flag shows the time spent in each syscall. Sort by duration:

awk '{print $4, $NF}' /tmp/startup.log | sort -rn | head -10

Scenario 4: Config file not being read

You updated a config file but the application ignores it. Verify which files the process actually opens:

strace -e openat -p $PID 2>&1 | grep "\.conf\|\.yml\|\.json"

Performance Caveats

strace adds significant overhead. On a busy production server, tracing every syscall can slow a process by 10-100x. Use these techniques to minimize impact:

  • Use -e trace= to filter only relevant syscalls
  • Use -c (summary mode) instead of per-call tracing
  • Trace for short bursts — seconds, not minutes
  • Use strace -p $PID sleep 5 to auto-stop after 5 seconds

Alternatives

  • perf — lower overhead, kernel-level profiling
  • ltrace — trace library calls instead of syscalls
  • bpftrace — eBPF-based, modern, production-safe tracing

strace is not a tool you leave running. It is a surgical instrument — apply it briefly, gather data, and move on.