How We Set Up Our KVM Hypervisor: From Bare Metal to Production-Ready VM Host
We recently built a dedicated KVM/libvirt hypervisor for running virtual machines β development environments, test labs, and container hosts. The goal was simple: take a bare metal server, tune every layer of the stack, and turn it into a production-ready VM host that doesnβt waste a single CPU cycle on overhead.
| Component | Spec |
|---|---|
| CPU | AMD EPYC 7351P β 16 cores / 32 threads, 4 NUMA nodes |
| RAM | 125 GiB DDR4 ECC |
| Network | Intel X520 10GbE (ixgbe driver), single active port + bridge |
| Storage | 2x WDC SN720 NVMe 512 GB (PCIe 3.0 x4) |
| OS | Debian 12, kernel 6.12.90 |
For the VM image filesystem (/var/lib/libvirt):
noatime,allocsize=1m,largeio,inode64,logbufs=8,logbsize=32k,noquota
noatime β Without it, every time a guest reads a qcow2 file, the host writes an atime update. With noatime, those writes are gone entirely.
allocsize=1m β XFS defaults to 4 KiB extent allocation. qcow2 images grow in 2-64 MiB clusters. By pre-allocating 1 MiB extents, we cut allocation overhead by 256x and eliminate fragmentation.
nobarrier β The WDC SN720 is a consumer NVMe drive with no power-loss protection. Barriers (FUA writes) keep the XFS journal consistent if the server suddenly loses power.
discard β Synchronous TRIM on every unlink destroys write latency. Instead, we enabled fstrim.timer for a weekly batch TRIM.
The second NVMe was formatted with mkfs.xfs -m reflink=1. This enables reflink copies β instant, copy-on-write clones of VM images.
cp --reflink=always debian12-base.qcow2 test-vm.qcow2
We allocated 121 GiB of the 125 GiB total RAM to hugepages, leaving 4 GiB for the host OS.
GRUB_CMDLINE_LINUX="... hugepages=61952"
After update-grub and a reboot:
HugePages_Total: 61952
HugePages_Free: 61952
We enabled zswap (lz4 compression, zsmalloc allocator, 20% pool limit) via kernel cmdline.
net.core.rmem_max = 134217728 # 128 MB
net.core.wmem_max = 134217728 # 128 MB
net.ipv4.tcp_rmem = 4096 1048576 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.busy_read = 50
net.core.busy_poll = 50
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 5
The performance governor locks all cores at maximum frequency.
kernel.numa_balancing = 0
kernel.timer_migration = 0
kernel.sched_autogroup_enabled = 0
/sys/module/kvm/parameters/halt_poll_ns = 200000
Kernel Samepage Merging adds unpredictable CPU overhead.
ufw default deny incoming
ufw default allow outgoing
ufw allow from 127.0.0.0/8 to any port 22
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 2
Three deployment scripts:
Plus matching Ansible playbooks for repeatable deployment.
sudo bash tune-xfs.shsudo bash tune-kvm.shsudo bash harden-ssh.shsudo update-grub && sudo rebootHugepages must be allocated at boot. The kernel cmdline approach is the only reliable way.
nobarrier on consumer NVMe is Russian roulette. Without barriers, a sudden power loss could corrupt the XFS journal.
Cockpit pulls in tuned β be ready for it. We fixed it by migrating everything into a custom tuned profile.
Reflink cloning on XFS is a superpower. Clone 40 GB VM images in under a second.
Related Articles
How We Set Up Our KVM Hypervisor: From Bare Metal to Production-Ready VM Host
Detailed walkthrough of building a dedicated KVM/libvirt hypervisor with XFS tuning, hugepages, 10GbE tuning, and automation.
Building a Predictable KVM Infrastructure: From Chaos to Control
How to engineer a predictable KVM-based infrastructure focusing on repeatability, observability, and operational safety.
Modular vagrant file
How to structure a modular Vagrantfile using separate .rb config files for resources, disks, and networks.